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3» GENOMIC PROMOTER REGION AND POLYMERASE GENE 
MUTATIONS RESPONSIBLE FOR ATTENUATION IN VIRUSES 
OF THE ORDER DESIGNATED MONONEGAVI RALES 

Field Of The In vention 

This invention relates to isolated, 
recombinantly-generated, attenuated, nonsegmented, 
negative -sense, single stranded RNA viruses of the 
Order designated Mononegavirales having at least one 
attenuating mutation in the 3' genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene. This invention was made with 
Government support under a grant awarded by the Public 
Health Service. The Government has certain rights in 
the invention. 

Background Of The Invention 

Enveloped, negative- sense, single stranded 
RNA viruses are uniquely organized and expressed. The 
genomic RNA of negative-sense, single stranded viruses 
serves two template functions in the context of a 
nucleocapsid: as a template for the synthesis of 
messenger RNAs (mRNAs) and as a template for the 
synthesis of the antigenome (+) strand. Negative- 
sense, single stranded RNA viruses encode and package 
their own RNA dependent RNA Polymerase. Messenger RNAs 
are only synthesized once the virus has been uncoated 
in the infected cell. Viral replication occurs after 
synthesis of the mRNAs and requires the continuous 
synthesis of viral proteins. The newly synthesized 
antigenome ( + ) strand serves as the template for 
generating further copies of the (-) strand genomic 
RNA. 
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The polymerase complex actuates and achieves 
transcription and replication by engaging the cis- 
acting signals at the 3" end of the genome, in 
particular, the promoter region* Viral genes are then 
5 transcribed from the genome template unidirectionally 

from its 3' to its 5' end. There is always less mRNA 
made from the downstream genes (e.g., the polymerase 
gene (L) ) relative to their upstream neighbors (i.e., 
the nucleoprotein gene (N) ) . Therefore, there is always 
10 a gradient of mRNA abundance according to the position 

of the genes relative to the 3'-end of the genome. 

Based on the revised reclassification in 1993 
by the International Committee on the Taxonomy of 
Viruses, an Order, designated Mononegavirales, has been 
15 established. This Order contains three families of 

enveloped viruses with single stranded, nonsegmented 
RNA genomes of minus polarity (negative-sense) . These 
families are the Paramyxoviridae, Rhabdoviridae and 
Filoviridae. The family Paramyxoviridae has been 
20 further divided into two subfamilies, Paramyxovirinae 

and Pneumovirinae . The subfamily Paramyxovirinae 
contains three genera, Paramyxovirus, Kubulavirus and 
Morbillivirus. The subfamily Pneumovirinae contains 
the genus Pneumovirua . 
25 The new classification is based upon 

morphological criteria, the organization of the viral 
genome, biological activities and the sequence 
relationships of the proteins. The morphological 
distinguishing feature among enveloped viruses for the 
30 subfamily Paramyxovirinae is the size and shape of the 

nucleocapsids (diameter 18mm, 1mm in length, pitch of 
5*5 nm) , which have a left-handed helical symmetry. The 
biological criteria are: 1) antigenic cross-reactivity 
between members of a genus, and 2) the presence of 
35 neuraminidase activity in the genera Paramyxovirus, 
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Rubulavirus and its absence in genus Mofbillivirus . In 
addition, variations in the coding potential of the P 
gene are considered, as is the presence of an extra 
gene (SH) in Rubulaviruses. 

Pneumoviruses can be distinguished from 
Paramyxovirinae morphologically because they contain 
narrow nucleocapsids . In addition, pneumoviruses have 
major differences in the number of protein- encoding 
cistrons (10 in pneumoviruses versus 6 in 
Paramyxovirinae) and an attachment protein (G) that is 
very different from that of Paramyxovirinae. Although 
the paramyxoviruses and pneumoviruses have six proteins 
that appear to correspond in function (N, P, M, G/H/HN, 
F and L) , only the latter two proteins exhibit 
significant sequence relatedness between the two 
subfamilies. Several pneumoviral proteins lack 
counterparts in most of the paramyxoviruses, namely the 
nonstructural proteins NS1 and NS2, the small 
hydrophobic protein SH, and a second protein M2 . Some 
paramyxoviral proteins, namely C and V, lack 
counterparts in pneumoviruses. However, the basic 
genomic organization of pneumoviruses and 
paramyxoviruses is the same. The same is true of 
rhabdoviruses and filoviruses. Table 1 presents the 
current taxonomical classification of these viruses, 
together with examples of each genus. 

Table 1 

Classification of Nonsegmented, negative-sense, single 

stranded RNA Viruses of the Order Mononegavirales 
Family Paramyxovirid^e 

Subfamily Paramyxovirinae 
Genus Paramyxovirus 

Sendai virus (mouse parainfluenza virus 
type 1) 
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Hunan parainfluenza virus (PIV) types 1 
and 3 

Bovine parainfluenza virus (BPV) type 3 
Genus Rubulavirus 

Simian virus 5 (SV) (Canine 
parainfluenza virus type 2) 
Mumps virus 

Newcastle disease virus (NDV) (avian 
Paramyxovirus 1) 

Human parainfluenza virus types 2, 4a 

and 4b 
Genus Morbillivirua 

Measles virus (MV) 

Dolphin Morbillivirus 

Canine distemper virus (CDV) 

Peste-dee-petits-ruminants virus 

Phocine distemper virus 

Rinderpest virus 
Subfamily Pneumovirinae 
Genus Pneumovlrua 

Human respiratory syncytial virus (RSV) 

Bovine respiratory syncytial virus 

Pneumonia virus of mice 

Turkey rhino tracheitis virus 
T?am4 Ty Rhabdoviridae 

Genus Lyaaavirua 

Rabies virus 
Genus Vesiculovirus 

Vesicular stomatitis virus 
Genus Ephemerovirua 

Bovine ephemeral fever virus 
Family Filovirdae 

Genus Filovirua 

Marburg virus 
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For many of these viruses, no vaccines of any 
kind are available. Thus, there is a need to develop 
vaccines against such human and animal pathogens. Such 
vaccines would have to elicit a protective immune 
response in the recipient. The qualitative and 
quantitative features of such a favorable response are 
extrapolated from those seen in survivors of natural 
virus infection, who, in general, are protected from 
reinfection by the same or highly related viruses for 
some significant duration thereafter. 

A variety of approaches can be considered in 
seeking to develop such vaccines, including the use of: 
(1) purified individual viral protein vaccines (subunit 
vaccines); (2) inactivated whole virus preparations; 
and (3) live, attenuated viruses. 

Subunit vaccines have the desirable feature 
of being pure, definable and relatively easily produced 
in abundance by various means, including recombinant 
DNA expression methods. To date, with the notable 
exception of hepatitis B surface antigen, viral subunit 
vaccines have generally only elicited short-lived 
and/or inadequate immunity, particularly in naive 
recipients. 

Formalin inactivated whole virus preparations 
of polio (IPV) and hepatitis A have proven safe and 
efficacious. In contrast, immunization with similarly 
inactivated whole viruses such as respiratory syncytial 
virus and measles virus vaccines elicited unfavorable 
immune responses and/or response profiles which 
predisposed vaccinees to exaggerated or aberrant 
disease when subsequently confronted with the natural 
or "wild- type" virus. 

Early attempts (1966) to vaccinate young 
children using a parenterally administered formalin- 
inactivated RSV vaccine. Unfortunately, several field 
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trials of this vaccine revealed serious adverse 
reactions -- the development of a severe illness with 
unusual features following subsequent natural infection 
with RSV (Bibliography entries 1,2). It has been 
5 suggested that this f ormalinized RSV antigen elicited 

an abnormal or unbalanced immune response profile, 
predisposing the vaccinee to RSV disease (3,4). 

Thereafter, live, attenuated RSV vaccine 
candidates were generated by cold passage or chemical 

10 mutagenesis. These RSV strains were found to have 

reduced virulence in seropositive adults. 
Unfortunately, they proved either over or under- 
attenuated when given to seronegative infants; in some 
cases, they also were found to lack genetic stability 

15 (5,6). Another vaccination approach using parenteral 

administration of live virus was ineffective and 
efforts along this line were discontinued (7) . 
Notably, these live RSV vaccines were never associated 
with disease enhancement as observed with the fonnalin- 

20 inactivated RSV vaccine described above. Currently, 

there are no RSV vaccines approved for administration 
to humans, although clinical trials are now in progress 
with cold-passaged, chemically mutagenized strains of 
RSV designated A2 and B-l. 

25 Appropriately attenuated live derivatives of 

wild- type viruses offer a distinct advantage as vaccine 
candidates. As live, replicating agents, they initiate 
infection in recipients during which viral gene 
products are expressed, processed and presented in the 

30 context of the vaccinee' s specific MHC class I and II 

molecules, eliciting humoral and cell-mediated immune 
responses, as well as the coordinate cytokine patterns, 
which parallel the protective immune profile of 
survivors of natural infection. 
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This favorable immune response pattern is 
contrasted with the delimited responses elicited by 
inactivated or subunit vaccines, which typically are 
largely restricted to the humoral immune surveillance 
arm. Further, the immune response profile elicited by 
some formalin inactivated whole virus vaccines, e.g., 
measles and respiratory syncytial virus vaccines 
developed in the 1960 's, have not only failed to 
provide sustained protection, but in fact have led to a 
predisposition to aberrant, exaggerated, and even fatal 
illness, when the vaccine recipient later confronted 
the wild- type virus. 

While live, attenuated viruses have highly 
desirable characteristics as vaccine candidates, they 
have proven to be difficult to develop. The crux of 
the difficulty lies in the need to isolate a derivative 
of the wild- type virus which has lost its disease- 
producing potential (i.e., virulence), while retaining 
sufficient replication competence to infect the 
recipient and elicit the desired immune response 
profile in adequate abundance. 

Historically, this delicate balance between 
virulence and attenuation has been achieved by serial 
passage of a wild- type viral isolate through different 
host tissues or cells under varying growth conditions 
(such as temperature) . This process presumably favors 
the growth of viral variants (mutants) , some of which 
have the favorable characteristic of attenuation. 
Occasionally, further attenuation is achieved through 
chemical mutagenesis as well. 

This propagation/passage scheme typically 
leads to the emergence of virus derivatives which are 
temperature sensitive, cold-adapted and/or altered in 
their host range -- one or all of which are changes 
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from the wild-type, disease-causing viruses -- i.e., 
changes that may be associated with attenuation. 

Several live virus vaccines, including those 
for the prevention of measles and mumps (which are 
paramyxoviruses) , and for protection against polio and 
rubella (which are positive strand RNA viruses) , have 
been generated by this approach and provide the 
mainstay of current childhood immunization regimens 
throughout the world. 

Nevertheless, this means for generating 
attenuated live virus vaccine candidates is lengthy 
and, at best, unpredictable, relying largely on the 
selective outgrowth of those randomly occurring genomic 
mutants with desirable attenuation characteristics. 
The resulting viruses may have the desired phenotype in 
vitro, and even appear to be attenuated in animal 
models. However, all too often they remain either 
under- or overattenuated in the human or animal host 
for whom they are intended as vaccine candidates. 

Even as to current vaccines in use, there is 
still a need for more efficacious vaccines. For 
example, the current measles vaccines provide 
reasonably good protection. However, recent measles 
epidemics suggest deficiencies in the efficacy of 
current vaccines. Despite maternal immunization, high 
rates of acute measles infection have occurred in 
children under age one, reflecting the vaccines' 
inability to induce anti-measles antibody levels 
comparable to those developed following wild- type 
measles infection (8,9,10). As a result, vaccine- 
immunized mothers are less able to provide their 
infants with sufficient transplacentally-derived 
passive antibodies to protect the newborns beyond the 
first few months of life. 
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Acute measles infections in previously 
immunized adolescents and young adults point to an 
additional problem. These secondary vaccine failures 
indicate limitations in the current vaccines 1 ability 
to induce and maintain antiviral protection that is 
both abundant and long-lived (11,12,13). Recently, yet 
another potential problem was revealed. The 
hemagglutinin protein of wild- type measles isolated 
over the past 15 years has shown a progressively 
increasing distance from the vaccine strains (14) . 
This "antigenic drift" raises legitimate concerns that 
the vaccine strains may not contain the ideal antigenic 
repertoire needed to provide optimal protection. Thus, 
there is a need for improved vaccines. 

Rational vaccine design would be assisted by 
a better understanding of these viruses, in particular, 
by the identification of the virally encoded 
determinants of virulence as well as those genomic 
changes which are responsible for attenuation. 

Summary Of The Invention 

Accordingly, it is an object of this 
invention to identify those regions of the genome of 
the RNA viruses of the Order Mononegavi rales where 
mutations result in attenuation of those viruses. 

It is a further object of this invention to 
produce recombinantly-generated viruses which 
incorporate such attenuating mutations in their 
genomes . 

It is still a further object of this 
invention to formulate vaccines containing such 
attenuated viruses. 

These and other objects of the invention as 
discussed below are achieved by the generation and 
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isolation of recoinbinantly-generated, attenuated, 
nonsegmented, negative- sense, single stranded RNA 
viruses of the Order Mononegavi rales having at least 
one attenuating mutation in the 3 1 genomic promoter 
5 region and having at least one attenuating mutation in 

the RNA polymerase gene. 

In the case of measles virus, at least one 
attenuating mutation in the 3 1 genomic promoter region 
is selected from the group consisting of nucleotide 26 

10 (A — > T) , nucleotide 42 (A -> T or A — > C) and 

nucleotide 96 (G — > A) , where these nucleotides, as 
well as others delineated in this application (unless 
stated otherwise) , are presented in positive strand, 
antigenomic, that is, message (coding) sense, and at 

15 least one attenuating mutation in the RNA polymerase 

gene is selected from the group consisting of 
nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 331 
(isoleucine -> threonine), 1409 (alanine -> threonine), 

20 1624 (threonine -> alanine), 1649 (arginine -> 

methionine) , 1717 (aspartic acid — ► alanine) , 1936 
(histidine -> tyrosine) , 2074 (glutamine -> arginine) 
and 2114 (arginine -> lysine) . 

In the case of human parainfluenza virus type 

25 3, at least one attenuating mutation in the 3" genomic 

promoter region is selected from the group consisting 
of nucleotide 23 (T — * C) , nucleotide 24 (C -> T) , 
nucleotide 28 (G -> T) and nucleotide 45 (T — ► A) , and 
at least one attenuating mutation in the RNA polymerase 

30 gene is selected from the group consisting of 

nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 942 
(tyrosine histidine) , 992 (leucine -> 
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phenylalanine), 1292 (leucine -+ phenylalanine), and 
1558 (threonine -> isoleucine) . 

In the case of human respiratory syncytial 
virus subgroup B, at least one attenuating mutation in 
the 3 ■ genomic promoter region is selected from the 
group consisting of nucleotide 4 (C G) and the 
insertion of an additional A in the stretch of A's at 
nucleotides 6-11, and at least one attenuating mutation 
in the RNA polymerase gene is selected from the group 
consisting of nucleotide changes which produce changes 
in an amino acid selected from the group consisting of 
residues 353 (arginine lysine) , 451 (lysine 
arginine) , 1229 (aspartic acid -> asparagine) , 2029 
(threonine -> isoleucine) and 2050 (asparagine 
aspartic acid) . 

In another embodiment of this invention, 
attenuated virus is used to prepare vaccines which 
elicit a protective immune response against the wild- 
type form of the virus. 

In yet another embodiment of this invention, 
an isolated, positive strand, antigenomic message sense 
nucleic acid molecule (or an isolated, negative strand 
genomic sense nucleic acid molecule) having the 
complete viral nucleotide sequence (whether of wild- 
type virus or virus attenuated by non- recombinant 
means) is manipulated by introducing one or more of the 
attenuating mutations described in this application to 
generate an isolated, recombinantly-generated 
attenuated virus. This virus is then used to prepare 
vaccines which elicit a protective immune response 
against the wild- type form of the virus. 

In still another embodiment of this 
invention, such a complete wild- type or vaccine viral 
nucleotide sequence is used: (1) to design PCR primers 
for use in a PCR assay to detect the presence of the 
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corresponding virus in a sample; or (2) to design and 
select peptides for use in an ELISA to detect the 
presence of the corresponding virus in a sample. 

5 Brief Description Of The Figures 

Figure 1 depicts the passage history of the 
Edmonston measles virus (15) • The abbreviations have 
the following meanings: HK - human kidney; HA - human 

10 amnion; CE(am) - chick embryo; CEF - chick embryo 

fibroblast; DK - dog kidney; WI-38 - human diploid 
cells; SK - sheep kidney; * - plaque cloning. The 
number following each abbreviation represents the 
number of passages . 

15 Figure 2 depicts a map of the measles virus 

genome showing putative cis- acting regulatory elements 
at and near the genome and antigenome termini. Top - a 
schematic map of the measles virus genome, beginning at 
the 3' end with 52 nucleotides of leader sequence (1) 

20 and ending at the 5' terminus with 37 nucleotides of 

trailer sequence (t) . Gene boundaries are denoted by 
vertical bars; below each gene is the number of 
cistronic nucleotides. Bottom - an expanded schematic 
view of the 3 1 extended genomic promoter regions of 

25 genome and antigenome, showing the position and 

sequence of the two highly conserved domains, A and B. 
The intervening intergenic trinucleotide is denoted as 
well. Nascent 5' RNAs encompassing the A' to B' 
regions are presumed to contain the regulatory sequence 

30 at which the N protein encapsidation initiates. 

Figure 3 depicts a genetic map of the RSV 
subgroup B wild- type strains designated 2B and 18537 
(top portion) , the intergenic sequences of those 
strains (middle portion) and the 68 nucleotide overlap 

35 between the M2 and L genes (bottom portion) . The RSV 
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2B stain has six fewer nucleotides in the O gene, 
encoding two fewer amino acid residues in the G 
protein, as compared to the 18537 strain. The 2B 
strain has 145 nucleotides in the 5' trailer region, as 
5 compared to 149 nucleotides in the 18537 strain. The 

2B strain has one more nucleotide in each of the NS-1, 
NS-2 and N genes, and one fewer nucleotide in each of 
the M and F genes, as compared to the 18537 strain. 

10 Detailed Description Of The Invention 

Transcription and replication of negative- 
sense, single stranded RNA viral genomes are achieved 
through the enzymatic activity of a multimeric protein 

15 acting on the ribonucleoprotein core (nucleocapsid) . 

Naked genomic RNA cannot serve as a template. Instead, 
these genomic sequences are recognized only when they 
are entirely encapsidated by the N protein into the 
nucleocapsid structure. It is only in that context 

20 that the genomic and antigenomic terminal promoter 

sequences are recognized to initiate the 
transcriptional or replication pathways. 

All paramyxoviruses require the two viral 
proteins, L and P, for these polymerase pathways to 

25 proceed. The pneumoviruses, including RSV, also 

require the transcription elongation factor, M2, for 
the transcriptional pathway to proceed efficiently. 
Additional cofactors may also play a role, including 
perhaps the virus-encoded NS1 and NS2 proteins, as well 

30 as perhaps host-cell encoded proteins. 

However, considerable evidence indicates that 
it is the L protein which performs most, if not all, 
the enzymatic processes associated with transcription 
and replication, including initiation, and termination 

35 of ribonucleotide polymerization, capping and 
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polyadenylation of mRNA transcripts, methylation and 
perhaps specific phosphorylation of P proteins. The L 
protein's central role in genomic transcription and 
replication is supported by its large size, sensitivity 
5 to mutations, and its catalytic level of abundance in 

the transcriptionally active viral complex (16) . 

These considerations led to the proposal that 
L proteins consist of a linear array of domains whose 
concatenated structure integrates discrete functions 
10 (17) . Indeed, three such delimited, discrete elements 

within the negative- sense virus L protein have been 
identified based on their relatedness to defined 
functional domains of other well-characterized 
proteins. These include: (1) a putative RNA template 
15 recognition and/or phosphodiester bond formation 

domain; (2) an RNA binding element; and (3) an ATP 
binding domain. All prior studies of L proteins of 
nonsegmented negative-sense, single stranded RNA 
viruses have revealed these putative functional 
20 elements (17) • 

Without being bound by the following, it is 
reasonable to presume that these non-protein coding, 
promoter and other cis- acting genomic regulatory 
domains are important determinants of the efficiency 
25 with which transcription and replication by measles 

virus (MV) and other viruses of the Order 
Mononegavirales are actualized, in association with the 
L protein, and that they may therefore be virulence 
determinants for these viruses as well. 
30 In summary, the invention is believed to 

encompass a coordinate set of changes between the cis- 
acting regulatory signal (3 1 genomic promoter region) 
and the polymerase gene (L) which results in 
attenuation of the virus while retaining sufficient 
35 ability of the virus to replicate. Attenuation is 
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optimized by rational mutations of the 3 1 genomic 
promoter region and the polymerase gene, which provide 
the desired balance of replication efficiency: so that 
the virus vaccine is no longer able to produce disease, 
yet retains its capacity to infect the vaccinee»s 
cells, to express sufficiently abundant gene products 
to elicit the full spectrum and profile of desirable 
immune responses, and to reproduce and disseminate 
sufficiently to maximize the abundance of the immune 
response elicited. 

Without being bound by the following, 
attenuating mutations in the extended promoter {3" 
genomic promoter region) and in the polymerase gene are 
believed to affect the display of cis-acting signals 
and the conformation of the polymerase complex engaging 
these signals. For example, when encapsidated, the 
promoter RNA is coiled in a helical array. Changes in 
promoter sequence may affect the relative positions at 
which the conserved signals are displayed relative to 
one another. Specifically, the measles wild-type 3' 
genomic promoter region has a pyrimidine (uracil) at 
positions 2 6 and 42 (the antigenomic message sense 
sequences have the purine adenine) . The vaccine 
strains have purines at those positions (the 
antigenomic message sense sequences have the 
corresponding pyrimidines; see Table 3 in Example 1 
below) . The larger purines may change the distance 
and/or angular display between the conserved domains of 
the promoter (e.g, in measles, positions 1-11 and 87- 
98), resulting in an altered spatial presentation of 
the cis -acting signals to the polymerase. 

Animal studies have demonstrated a decrease 
in viral replication sufficient to avoid illness but 
adequate to elicit the desired immune response. This 
likely represents a decrease in transcription, a 
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decrease in gene expression of virally encoded 
proteins, a decrease in antisense templates and, 
therefore, the production of fewer new genomes. The 
resulting attenuated viruses are significantly less 
5 virulent than the wild- type. 

The attenuating mutations described herein 
may be introduced into viral strains by two methods: 

(1) Conventional means such as chemical 
mutagenesis during virus growth in cell cultures to 

10 which a chemical mutagen has been added, selection of 

virus that has been subjected to passage at suboptimal 
temperature in order to select temperature sensitive 
and/or cold adapted mutations, identification of mutant 
virus that produce small plagues in cell culture, and 

15 passage through heterologous hosts to select for host 

range mutations. These viruses are then screened for 
attenuation of their biological activity in an animal 
model. Attenuated viruses are subjected to nucleotide 
sequencing of their 3' genomic promoter region and 

20 polymerase genes to locate the sites of attenuating 

mutations. Once this has been done, method (2) is then 
carried out. 

(2) A preferred means of introducing 
attenuating mutations comprises making predetermined 

25 mutations using site-directed mutagenesis. These 

mutations are identified either by method (1) or by 
reference to closely-related viruses whose attenuating 
mutations are already known. One or more mutations are 
introduced into each of the 3 1 genomic promoter region 

30 and the polymerase gene. Cumulative effects of 

different combinations of coding and non-coding changes 
can also be assessed. 

The mutations to the 3 ■ genomic promoter 
region and polymerase gene are introduced by standard 

35 recombinant DNA methods into a DNA copy of the viral 
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genome. This may be a wild- type or a modified viral 
genome background (such as viruses modified by method 
(1)), thereby generating a new virus. Infectious 
clones or particles containing these attenuating 
mutations are generated using the cDNA "rescue" system, 
which has been applied to a variety of viruses, 
including Sendai virus (18); measles virus (19); 
respiratory syncytial virus (20); rabies (21); 
vesicular stomatitis virus (VSV) (15); and rinderpest 
virus (23); these references are hereby incorporated by 
reference. See, for measles virus rescue, published 
International patent application WO 97/06270, 
designating the United States (24); for PIV-3 rescue, 
U.S. provisional patent application 60/047575 (25); for 
RSV rescue, published International patent application 
WO 97/12032, designating the United States (26); these 
applications are hereby incorporated by reference. 

Briefly, all Mononegavirales rescue systems 
can be summarized as follows: Each requires a cloned 
DNA equivalent of the entire viral genome placed 
between a suitable DNA- dependent RNA polymerase 
promoter (e.g., the T7 RNA polymerase promoter) and a 
self -cleaving ribozyme sequence (e.g., the hepatitis 
delta ribozyme) which is inserted into a propagatable 
bacterial plasmid. This transcription vector provides 
the readily manipulable DNA template from which the RNA 
polymerase (e.g., T7 RNA polymerase) can faithfully 
transcribe a single- stranded RNA copy of the viral 
antigenome (or genome) with the precise, or nearly 
precise, 5' and 3' termini. The orientation of the 
viral genomic DNA copy and the flanking promoter and 
ribozyme sequences determine whether antigenome or 
genome RNA equivalents are transcribed. Also required 
for rescue of new virus progeny are the virus-specific 
trans-acting proteins needed to encapsidate the naked, 
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single-stranded viral antigenome or genome RNA 
transcripts into functional nucleocapsid templates: 
the viral nucleocapsid (N or NP) protein, the 
polymerase-associated phosphoprotein (P) and the 
polymerase (L) protein. These proteins comprise the 
active viral RNA-dependent RNA polymerase which must 
engage this nucleocapsid template to achieve 
transcription and replication. 

The trans -acting proteins required for 
measles virus rescue are the encapsidating protein N, 
and the polymerase complex proteins, P and L. For PIV- 
3, the encapsidating protein is designated NP, and the 
polymerase complex proteins are also referred to as P 
and L. For RSV, the virus-specific trans-acting 
proteins include N, P and L, plus an additional 
protein, M2, the RSV- encoded transcription elongation 
factor. 

Typically, these viral trans-acting proteins 
are generated from one or more plasmid expression 
vectors encoding the required proteins, although some 
or all of the required trans-acting proteins may be 
produced within mammalian cells engineered to contain 
and express these virus -specific genes and gene 
products as stable transf ormants . 

The typical (although not necessarily 
exclusive) circumstances for rescue include an 
appropriate mammallian cell milieu in which T7 
polymerase is present to drive transcription of the 
antigenomic (or genomic) single- stranded RNA from the 
viral genomic cDNA- containing transcription vector. 
Either ^transcriptionally or shortly thereafter, this 
viral antigenome (or genome) RNA transcript is 
encapsidated into functional templates by the 
nucleocapsid protein and engaged by the required 
polymerase components produced concurrently from co- 
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transfected expression plasmids encoding the required 
virus -specific trans-acting proteins. These events and 
processes lead to the prerequisite transcription of 
viral mRNAs, the replication and amplif ication of new 
genomes and, thereby, the production of novel viral 
progeny, i.e., rescue . 

For the rescue of rabies, VSV and Sendai, T7 
polymerase is provided by recombinant vaccinia virus 
VTF7-3. This system, however, requires that the 
rescued virus be separated from the vaccinia virus by 
physical or biochemical means or by repeated passaging 
in cells or tissues that are not a good host for 
poxvirus. For MV cDNA rescue, this requirement is 
avoided by creating a cell line that expresses T7 
polymerase, as well as viral N and P proteins. Rescue 
is achieved by transfecting the genome expression 
vector and the L gene expression vector into the helper 
cell line. Advantages of the host -range mutant of the 
vaccinia virus, MVA-T7, which expresses the T7 RNA 
polymerase, but does not replicate in mammalian cells, 
are exploited to rescue RSV, Rinderpest virus and MV. 
After simultaneous expression of the necessary 
encapsidating proteins, synthetic full length 
antigenomic viral RNA are encapsidated, replicated and 
transcribed by viral polymerase proteins and replicated 
genomes are packaged into infectious virions. In 
addition to such antigenomes, genome analogs have now 
been successfully rescued for Sendai and P1V-3 (25,27). 

The rescue system thus provides a composition 
which comprises a transcription vector comprising an 
isolated nucleic acid molecule encoding a genome or 
antigenome of a nonsegmented, negative- sense, single 
stranded RNA virus of the Order Mononegavirales having 
at least one attenuating mutation in the 3 1 genomic 
promoter region and having at least one attenuating 
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mutation in the RNA polymerase gene, together with at 
least one expression vector which comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins necessary for encapsidation, 

5 transcription and replication (e.g., N, P and L for 

measles virus; NP, P and L for PIV-3; N, P, L and M2 
for RSV) . Host cells are then transformed or 
transfected with the at least two expression vectors 
just described. The host cells are cultured under 

10 conditions which permit the co-expression of these 

vectors so as to produce the infectious attenuated 
virus. 

The rescued infectious virus is then tested 
for its desired phenotype (temperature sensitivity, 

15 cold adaptation, plaque morphology, and transcription 

and replication attenuation) , first by in vitro means. 
The mutations at the cis-acting 3' genomic promoter 
region are also tested using the minireplicon system 
where the required trans-acting encapsidation and 

20 polymerase activities are provided by wild- type or 

vaccine helper viruses, or by plasmids expressing the 
N, P and different L genes harboring gene-specific 
attenuating mutations (19,28) . 

If the attenuated phenotype of the rescued 

25 virus is present, challenge experiments are conducted 

with an appropriate animal model* Non-human primates 
provide the preferred animal model for the pathogenesis 
of human disease. These primates are first immunized 
with the attenuated, recombinantly-generated virus, 

30 then challenged with the wild- type form of the virus. 

Monkeys are infected by various routes, including but 
not limited to intranasal, intratracheal or 
subcutaneous routes of inoculation (29) • 
Experimentally infected rhesus and cynomolgus macaques 

35 have also served as animal models for studies of 
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vaccine- induced protection against measles (30) . 
Protection is measured by such criteria as disease 
signs and symptoms, survival, virus shedding and 
antibody titers. If the desired criteria are met, the 
attenuated, recombinantly- generated virus is considered 
a viable vaccine candidate for testing in humans. The 
"rescued" virus is considered to be 11 recombinantly- 
generated", as are the progeny and later generations of 
the virus, which also incorporate the attenuating 
mutations . 

Even if a "rescued virus is underattenuated 
or overattenuated relative to optimum levels for 
vaccine use, this is information which is valuable for 
developing such optimum strains. 

Optimally, a codon containing an attenuating 
point mutation may be stabilized by introducing a 
second or a second plus a third mutation in the codon 
without changing the amino acid encoded by the codon 
bearing only the attenuating point mutation. 
Infectious virus clones containing the attenuating and 
stabilizing mutations are also generated using the cDNA 
"rescue" system described above. 

Measles virus serves as a useful model for 
this invention, because sequence data are now available 
as described herein for the disease -causing wild- type 
virus and for the disease-preventing vaccines which 
have a demonstrated history of efficacy. 

Measles virus was first isolated in tissue 
culture in 1954 (31) from an infected patient named 
David Edmonston. This Edmonston strain of measles 
became the progenitor for many live -attenuated measles 
vaccines including Moraten, which is the current 
vaccine in the United States (Attenuvax™; Merck Sharp & 
Dohme, West Point, PA) and was licensed in 1968 and has 
proven to be efficacious. 
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Aggressive immunization programs instituted 
in the mid to late 1960s resulted in the precipitous 
drop in reported measles cases from near 700,000 in 
1965 to 1500 in 1983. In parallel, other vaccine 

5 strains were also developed from the Edmonston strain 

(see Pig. 1), Schwarz (Institut Merieux, Lyon, Prance), 
Zagreb (Zagreb, Yugoslavia) and AIK-C (Japan) . These 
other vaccines have also proven to be efficacious and 
have been used extensively. An early, reactogenic, 

10 under attenuated vaccine strain (Rubeovax™: Merck Sharp 

& Dohme) produced measles-like illness in children and 
its use thus was discontinued. It, however, was 
further attenuated successfully to produce the Moraten 
vaccine strain (see Fig. 1) (32) . Live measles virus 

15 vaccine provides a success story of the development of 

an efficacious vaccine and provides a model for 
understanding the molecular mechanisms of viral vaccine 
attenuation among nonsegmented, negative -sense, single 
stranded RNA viruses. 

20 Because of its significance as a major cause 

of human morbidity and mortality, measles virus (MV) 
has been quite extensively studied. MV is a large, 
relatively spherical, enveloped particle composed of 
two compartments, a lipoprotein membrane and a 

25 ribonucleoprotein particle core, each having distinct 

biological functions (33) . The virion envelope is a 
host cell-derived plasma membrane modified by three 
virus -specified proteins: The hemagglutinin (H; 
approximately 80 kilodaltons (kD) ) and fusion (F x . 2 ; 

30 approximately 60 kD) glycoproteins project on the 

virion surface and confer host cell attachment and 
entry capacities to the viral particle (16) . 
Antibodies to H and/or F are considered protective 
since they neutralize the virus' ability to initiate 

35 infection (34,35,36). The matrix (M; approximately 37 
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kD) protein is the amphipathic protein lining the 
membrane's inner surface, which is thought to 
orchestrate virion morphogenesis and thus consummate 
virus reproduction (37). The virion core contains the 
15,894 nucleotide long genomic RNA upon which template 
activity is conferred by its intimate association with 
approximately 2600 molecules of the approximately 60 kD 
nucleocapsid (N) protein (38,39,40). Loosely 
associated with this approximately one micron long 
helical ribonucleoprotein particle are enzymatic levels 
of the viral RNA dependent RNA polymerase (L; 
approximately 240 kD) which in concert with the 
polymerase cof actor (P; approximately 70 kD) , and 
perhaps yet other virus-specified as well as 
host-encoded proteins, transcribes and replicates the 
MV genome sequences (41) . 

To date, the entire nucleotide sequences 
(only for the Edmonston B laboratory strain and the 
AIK-C vaccine strain), coding potential, and 
organization of the MV genome have been reported (33) . 
The six virion structural proteins are encoded by six 
contiguous, non-overlapping genes which are arrayed as 
follows: 3'-N-P-M-F-H-L-5' . Two additional MV gene 
products of as yet uncertain function have also been 
identified. These two nonstructural proteins, known as 
C (approximately 20 kD) and V (approximately 45 kD) , 
are both encoded by the P gene, the former by a second 
reading frame within the P mRNA; the latter by a 
^transcriptionally edited P gene-derived mRNA which 
encodes a hybrid protein having the amino terminal 
sequences of P and a new zinc finger-like cysteine-rich 
carboxy terminal domain (16) . 

In addition to the sequences encoding the 
virus-specified proteins, the MV genome contains 
distinctive non-protein coding domains resembling those 
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directing the transcriptional and replicative pathways 
of related viruses (16,42). These regulatory signals 
lie at the 3 1 and 5' ends of the MV genome and in short 
internal regions spanning each intercistronic boundary. 
The former encode the putative promoter and/or 
regulatory sequence elements directing genomic 
transcription, genome and antigenome encapsidation, and 
replication. The latter signal transcription 
termination and polyadenylation of each monocistronic 
viral mRNA and then reinitiation of transcription of 
the next gene. In general, the MV polymerase complex 
appears to respond to these signals much as the 
RNA- dependent RNA polymerases of other non- segmented 
negative strand RNA viruses (16,42,43,44). 

Transcription initiates at or near the 3 1 end 
of the MV genome and then proceeds in a 5 1 direction 
producing monocistronic mRNAs (40,42,45). As the 
polymerase traverses the MV genomic template, it 
encounters putative stop/start signals which, in 3' to 
5' order, are: a semi-conserved transcription 
termination/polyadenylation signal (A/G U/C UA A/U NN 
A 4 , where N may be any of the four bases) at which each 
monocistronic RNA is completed; a non- transcribed 
intergenic trinucleotide punctuation mark (CUU; except 
at the H : L boundary where it is CGU) ; and a 
semiconserved start signal for transcription initiation 
of the next gene (AGG A/G NN C/A A A/G G A/U, where N 
may be any of the four bases) (45,46). Since some 
polymerase complexes fail to reinitiate, the abundance 
of each MV mRNA diminishes in parallel with the 
distance of the encoding gene from the genomic 3' end. 
This mRNA gradient directly corresponds to the relative 
abundance of each virus -specified protein. This 
indicates that MV protein expression is ultimately 
controlled at the transcriptional level (44) . 
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The 3 1 and 5 1 MV genomic termini contain 
non-protein coding sequences with distinct parallels to 
the leader and trailer RNA encoding regions of VSV 
(42) . Nucleotides 1-55 define the region between the 
genomic 3' terminus and the beginning of the N gene, 
while 37 additional nucleotides can be found between 
the end of the L gene and the 5 1 terminus of the 
genome. However, unlike VSV, or even the 
paramyxoviruses Sendai and NDV, MV does not transcribe 
these terminal regions into short, unmodified (+) or 
(-) sense leader RNAs (47,48,49) • Instead, leader 
readthrough transcripts, including full-length 
polyadenylated leader:N, leader:N:P, leader :N:P :M, and 
of course full-length antigenome MV RNAs are 
transcribed (48,49)* Thus, the short leader 
transcript, the key operational element determining the 
switch from transcription to replication of the VSV 
single -stranded, negative polarity genome (50,51,52), 
seems absent in MV. This leads to consideration and 
exploration of alternative models for this crucial 
reproductive event (42) ♦ 

Measles virus, as well as all other 
Mononegavirales except the rhabdoviruses, appears to 
have extended its terminal regulatory domains beyond 
the confines of leader and trailer encoding sequences 
(42). For measles, these regions encompass the 107 3 a 
genomic nucleotides (the "3» genomic promoter region", 
also referred to as the "extended promoter", which 
comprises 52 nucleotides encoding the leader region, 
followed by three intergenic nucleotides, and 52 
nucleotides encoding the 5 V untranslated region of N 
mRNA) and the 109 5' end nucleotides (69 encoding the 
3 1 untranslated region of L mRNA, the intergenic 
trinucleotide and 37 nucleotides encoding the trailer) . 
Within these 3' terminal approximately 100 nucleotides 
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of both the genome and antigenome are two short regions 
of shared nucleotide sequence: 14 of 16 nucleotides at 
the absolute 3 V ends of the genome and antigenome are 
identical. Internal to those termini, an additional 
region of 12 nucleotides of absolute sequence identity 
have been located. Their position at and near the 
sites at which the transcription of the MV genome must 
initiate and replication of the antigenome must begin, 
suggests that these short unique sequence domains 
encompass an extended promoter region. 

These discrete sequence elements may dictate 
alternative sites of transcription initiation the 
internal domain mandating transcription initiation at 
the N gene start site, and the 3' terminal domain 
directing antigenome production (42,48,53). In 
addition to their regulatory role as cis-acting 
determinants of transcription and replication, these 3» 
extended genomic and antigenomic promoter regions 
encode the nascent 5* ends of antigenome and genome 
RNAs, respectively. Within these nascent RNAs reside 
as yet unidentified signals for N protein nucleation, 
another key regulatory element required for 
nucleocapsid template formation and consequently for 
amplification of transcription and replication. Figure 
2 schematically shows the location and sequence of 
these highly conserved, putative cis-acting regulatory 
domains . 

Terminal non-protein coding regions similar 
in location, size and spacing are present in the 
genomes of other members of the genus Paramyxoviridae , 
though only 8-11 of their absolute terminal nucleotides 
are shared by MV (42,54). The genomic terminii of the 
Morblllivirua canine distemper virus (CDV) displays a 
greater degree of homology with its MV relative: 73% 
of the nucleotides of the leader and trailer sequences 
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of these two viruses are identical, including 16 of 18 
at the absolute 3' termini and 17 of 18 at their 5' 
ends (55). No accessory internal CDV genomic domain- 
sharing homology to that of the MV extended promoter 
has been found. However, there is a 20 nucleotide long 
stretch lying between CDV genomic nucleotides 85 and 
104 and 15,587 and 15,606 in which 15 of the 20 
nucleotides are complementary (Gene Bank accession 
number AF 14953) . This indicates that CDV, like MV 
contains an additional region within its non-coding 3 1 
genomic and antigenomic ends that may provide important 
cis-acting promoter and/or regulatory signals (55) . 

Additionally, the precise length of the 3 • - 
leader region (55 nucleotides) is identical among 
several members of the Family Paramyxoviridae (MV, CDV, 
PIV-3, BPV-3, SV and NDV) . Further evidence for the 
importance of these extended, non-protein coding 
regions comes from analyses of a large number of 
distinct copy-back Defective Interfering Viruses (DIs) 
recently cloned from subacute sclerosing 
panencephalitis (SSPE) brain tissue. No DI with a stem 
shorter than the 95 5' terminal genomic nucleotides was 
found. This indicates that the minimal signals needed 
for MV DI RNA replication and encapsidation extend well 
beyond the 37 nucleotide long trailer sequence to 
encompass the additional internal putative regulatory 
domain (56) . 

As exemplified in part by measles virus, this 
invention is directed to the concept that important 
virulence/attenuation determinants reside in viral 
genomic non-protein coding regulatory regions and in 
the transacting transcription/replication enzyme 
complex with which these cis-acting elements must 
interact. The cis-acting domains are found both at the 
3' and 5' ends of the MV genome, flanking the six 
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contiguous genes encoding viral structural proteins; 
and within the MV genome as short regions encompassing 
internal intergenic boundaries. The former encode the 
putative promoter and/or regulatory sequence elements 

5 directing the vital processes of genomic transcription, 

genome and antigenome encapsidation, and replication. 
The latter signal transcription termination and 
polyadenylation of each monocistronic viral mRNA and 
then reinitiation of transcription of the next gene. 

10 The transcription/replication enzyme, RNA dependent RNA 

polymerase molecule can modulate transcription and/or 
replicative efficiency, thereby determining the 
abundance of cytopathic viral gene products and/or 
virion progeny. 

15 Proof of the concept of this invention for 

measles virus is obtained by first determining the 
nucleotide sequences of the non-coding regulatory 
regions (3' genomic promoter region) and the coding 
regions of the L gene (with predicted amino acid 

20 sequences) of the progenitor Edmonston wild-type MV 

isolate, together with available measles vaccine 
strains derived from this isolate (see Figure 1) . 
Independent other wild- type isolates were examined for 
comparative purposes as well. 

25 The nucleotide sequences (in positive strand, 

antigenomic, message sense) of four wild- type and five 
vaccine measles strains, as well as the deduced amino 
acid sequences of the RNA polymerase (L protein) of 
these measles viruses, are set forth as follows with 

30 reference to the appropriate SEQ ID NOS. contained 

herein : 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



29 



Virus 

Wild-Type 

Edmonston 

1977 

1983 

Monte fiore 



Nucleotide Sequence L Protein Sequence 



SEQ ID N0:1 
SEQ ID NO: 3 
SEQ ID NO: 5 
SEQ ID NO: 7 



SEQ ID NO: 2 
SEQ ID NO: 4 
SEQ ID NO: 6 
SEQ ID NO: 8 



10 



Vaccine 
Rubeovax™ 
Mora ten 
Zagreb 
AIK-C 



SEQ ID NO: 9 
SEQ ID NO: 11 
SEQ ID NO: 13 
SEQ ID NO: 15 



SEQ ID NO: 10 
SEQ ID NO: 12 
SEQ ID NO: 14 
SEQ ID NO: 16 



15 



20 



25 



30 



35 



Each measles virus genome listed above is 
15,894 nucleotides in length. Translation of the L 
gene starts with the codon at nucleotides 9234-9236; 
the translation stop codon is at nucleotides 15783- 
15785. The translated L protein is 2,183 amino acids 
long. 

Note that nucleotide 2499 of 1983 wild-type 
measles virus is indicated as *G" in SEQ ID NO: 5. In 
fact, the base is actually a mixture of *G" and ■C". 
Also note that nucleotide 2143 of Rubeovax™ vaccine 
virus is indicated as %t T" in SEQ ID NO: 9. In nine 
clones sequenced, this base was W T" in seven and *C in 
two; thus, this base can be M T" or ™C m . 

In addition, the Schwarz vaccine virus genome 
is identical to that of the Moraten vaccine virus 
genome (SEQ ID NO:ll), except that at nucleotides 4917 
and 4924, Schwarz has a XS C" instead of a "T*. 

Nucleotide differences distinguishing the 3 1 
genomic promoter region and nucleotide and amino acid 
differences distinguishing the L gene and L protein 
sequences of the Edmonston wild- type isolate, vaccine 
strains and other independently isolated wild- type 
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viruses were then compared and aligned (see Tables 3-5 
in Example 1 below) . 

As shown in Table 3, there were three 
mutations from the 3' genomic promoter region (in 

5 antigenomic, message sense) of the progenitor wild- type 

MV isolate and the derivative vaccine strains: At 
nucleotide position 26 , from "A" to "T"; at position 
42, from "A" to tt C n or from "A" to "T M ; and in the case 
of Zagreb only, at position 96 , from n G w to "A". In 

10 addition, the other examined wild- type isolates 

differed from both the progenitor wild- type isolate and 
the vaccine strains at position 50 by having "A" 
instead of "G" . 

The predicted amino acid sequences of the L 

15 genes of measles vaccine strains (Rubeovax™, Moraten, 

Schwarz, AIK-C and Zagreb) and wild- type isolates 
(1977, 1983 and Montef iore) , differ from the progenitor 
strain (Edmonston) at 49 positions in the 2183 amino 
acid long open reading frame (see Tables 4 and 5 in 

20 Example 1 below) . 

These amino acid differences can be divided 
into four categories: 

(1) Positions where one vaccine strain 
differs from the progenitor, as well as from other 

25 vaccine and wild-type strains, suggesting a potential 

attenuation site. 

(2) Specific differences between all wild- 
type and all vaccine sequences; these may also 
constitute important attenuation sites. 

30 3) Residues where chronologically newer wild- 

types differ from older wild- types; which may be 
attributable to genetic drift. 

(4) Positions where one or more vaccine 
strains and/or wild- type strains have common amino 

35 acids and differ from all the other strains; these 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 31 - 



changes may represent lineage -specific, potentially 
attenuating changes within the vaccine strains and 
relatedness among the wild- type isolates, respectively. 

There were four category (1) changes where 
one vaccine differed from the other vaccines, as well 
as the wild- type strains. Two of these were in Moraten 
and Schwarz (amino acids 331 and 2114) and two were in 
AIK-C (1624 and 2074) . These mutations are of special 
interest because all of these viruses are good 
vaccines. Thus, these positions are sites for 
attenuation. 

Only one position, 1717, fits into category 
(2), with all wild-types having aspartic acid and all 
vaccines having alanine. Interestingly, this position 
is in one of two areas where the L genes of measles and 
canine distemper virus (which are otherwise highly 
homologous) do not show exceptional conservation. This 
difference makes it more likely that 1717 is a key 
position for an attenuating mutation in measles. 

There were five positions, 149, 63 6, 720, 
2017 and 2119, where both chronologically newer wild- 
types (1983 and Montefiore) differ from older wild- 
types (Edmonston and 1977), which therefore fit into 
category (3) . These differences suggest genetic drift 
rather than denoting sites of attenuating mutations. 
Not included in this total are 16 positions where 
Montefiore (the 1989 isolate) differed from the rest 
(see Table 5) . These could be either genetic drift 
(category (3)) or random change (category (4)). The 
remaining 23 positions are category (4) , with one or 
more of the viruses differing from the consensus. 

Three of these positions (1409, 1649, 1936) 
are potentially attenuating category (4) mutations. 
These are changes where two vaccine strains have a 
common change from the progenitor wild- type strain. 
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These changes may be connected with the vaccine lineage 
leading to the Rubeovax™ and Moraten vaccines (Figure 
1) . 

Applicants have found that their AIK-C 
5 vaccine strain nucleotide sequence differs from the 

published sequence (33) at 21 positions, including one 
insertion and one deletion. Several of these 
differences result in coding changes including two in 
the L gene (at amino acids 1477 and 2008) . 
10 Thus, the additional changes accrued within 

the L gene sequence as the measles progenitor strain is 
progressively attenuated to achieve a replicative 
capacity optimized for live vaccine purposes appears to 
be constrained and delimited. Presumably, this limited 
15 tolerance in the number and location of L gene changes 

is imposed not only by the need to preserve the 
multifunctional capacities of the polymerase, but also 
by the preexisting 3' promoter changes with which the 
evolving L protein must interact to achieve 
20 transcription and replication. In other words, optimal 

virus attenuation requires coordinate (i.e., linked) 
changes in the polymerase protein and the cis -acting 
regulatory elements on which it acts. 

The 3 1 -leader displays the least tolerance 
25 for change, allowing highly selected changes during the 

attenuation process at nucleotide position 26 (always 
the change of from "A" to 11 T 11 ) , and at position 42 (the 
change of from "A" to "C" or from "A" to »T») (in 
antigenomic, message sense) . In the case of Zagreb 
30 only, there is a single further change, from "G" to "A" 

at position 96, which may be important when combined 
with Zagreb L gene- specif ic changes. The 3' -leader 
region seems to have undergone only one instance of 
genetic drift since 1954, with a change of W G" to "A" 
35 at position 50 (see Table 3) . 
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The net change in the 3 1 genomic promoter 
region during the attenuation process is the 
replacement of two pyrimidines by two purines in 
genomic sense in all MV vaccine strains. The co- 
evolution of the L gene during these attenuation 
processes is believed to reflect selection of subtle 
changes favoring reproduction of the viruses in 
different host cells. All the vaccine strains were 
grown in chick embryo (CE) or chick embryo fibroblast 
(CEF) cells during their attenuation process (Figure 
1) . In addition, some vaccine strains have been 
exposed to unique host cells; i.e., Zagreb vaccine was 
grown in dog kidney cells and human diploid cells, 
while the AIK-C vaccine was adapted to sheep kidney 
cells. Moraten and Rubeovax™ were exclusively 
developed in CE and CEF. 

Some of the lineage- specif ic L gene changes 
(position 1649 in Rubeovax™, Moraten and Schwarz 
vaccines and the change at position 1717 in all 
vaccines) represent a subset of adaptations of the L 
gene to the 3 '-leader to modulate the 
transcription/replication processes for vaccine 
attenuation. Additionally, individual vaccine- specif ic 
changes (category (1) ) may provide additional fine tune 
modulation of virus replication/transcription for each 
vaccine strain. 

Based on Table 3 and the foregoing 
discussion, the key attenuating mutations for the MV 3 1 
genomic promoter region are nucleotide 26 (A -» T) , 
nucleotide 42 (A — > T or A — > C) and nucleotide 96 (G -> 
A) (in antigenomic, message sense) . 

Based on Table 4 and the foregoing 
discussion, the key attenuating sites for the L protein 
are as follows: amino acid residues 331 (isoleucine -» 
threonine), 1409 (alanine -» threonine) , 1624 
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(threonine alanine), 1649 (arginine -> methionine), 
1717 (aepartic acid -> alanine) , 1936 (histidine -» 
tyrosine) , 2074 (glutamine -> arginine) and 2114 
(arginine -* lysine) . It is understood that the 
5 nucleotide changes responsible for these amino acid 

changes are not limited to those set forth in Table 4 
of Example 1 below; all changes in nucleotides which 
result in codons which are translated into these amino 
acids are within the scope of this invention. 

10 Human parainfluenza virus type 3 (HPIV-3) is 

another nonsegmented, negative -sense, single stranded 
enveloped RNA virus. HPIV-3 belongs to the Family 
Paramyxoviridae (see Table 1) - The genome of HPIV-3 is 
15,462 nucleotides long and encodes six non- overlapping 

IS protein- encoding genes (57) . Five of the genes encode 

a single virion structural protein each, which are 
designated NP (corresponding to the N protein of MV) , 
M, F, HN (hemagglutinin-neuraminidase) and L. The 
sixth mRNA encodes the P protein, and by an overlapping 

20 5 1 proximal open reading frame (ORF) encodes the C 

protein, and by the RNA editing mechanism, also encodes 
the D protein. 

Like MV, HPIV-3 consists of a 3 1 -nonprotein 
coding leader region of 55 nucleotides, but unlike 

25 measles (where it is 37 nucleotides) , it has a 44 

nucleotide long 5" -trailer region* The polymerase 
transcribes the genome in a linear, sequential, start- 
stop manner which is guided by transcription signals in 
the RNA template. 

30 Attempts to develop a live attenuated HPIV-3 

vaccine by passaging the wild- type virus JS strain 
through cell culture at sub- optimal temperature has 
produced promising results (7,57). Several "cold 
passage" (cp) mutants were isolated for evaluation from 

35 different passage levels of the JS strain. One such 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 35 - 



mutant resulted from 45 serial passages and was 
designated cp45. 

This virus exhibited three interesting 
properties: (1) cold adaptation (ca): the ability to 
replicate efficiently at the suboptimal temperature of 
20°C; (2) temperature sensitivity (ts) : inability to 
replicate in vitro at temperatures greater than or 
equal to 39°C; and (3) small plaque morphology. This 
mutant appeared to be a promising vaccine candidate 
because: (a) its ca, ts and small plaque phenotype is 
stable after passage in cell culture; (b) its 
replication is restricted in both the upper and lower 
respiratory tract of hamsters; and (c) it induced 
significant protection in hamsters against subsequent 
challenge with wild-type HPIV-3 (58,59). 

Evaluation of this strain in the rhesus 
monkey showed the attenuation mutations in cp45 to be a 
combination of ts and non-ts mutations (60) . 
Subsequent evaluation in chimpanzees indicated that 
cp45 appeared to be satisfactorily attenuated while 
still able to induce a high level of protection against 
wild- type virus challenge (61) . Later preliminary 
clinical evaluation of cp45 in seronegative human 
infants and small children suggested that this 
candidate vaccine strain is suitably infectious and 
attenuated, as well as being moderately immunogenic 
(61) • 

The cp45 strain has been grown in both fetal 
rhesus lung (FRhL) and Vero cells as follows: The PIV- 
3 cp45 virus grown in FRhL cells was prepared by 
inoculating confluent FRhL cell monolayers in tissue 
culture flasks at an MOI 0.1-1.0. The infected cell 
cultures were fed with EMEM medium and incubated at 
32°C. About seven days later, when maximal cytopathic 
effects (synctyia) were observed, the virus was 
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harvested by subjecting the cultures to one freeze- thaw 
cycle, pooling the fluids and then storing the virus at 
-70 °C. 

The PIV-3 cp45 virus grown in Vero cells was 
5 prepared by inoculating with virus a bioreactor culture 

of confluent monolayers of Vero cells on microcarrier 
beads which was continuously stirred. The infected 
bioreactor culture was maintained at 3 0°C. The virus 
was harvested 4-5 days later when syncytial CPE was 

10 observed. The culture fluid containing the virus was 

stored at -70 °C. 

The nucleotide sequences (in positive strand, 
antigenomic, message sense) of the HPIV-3 JS wild- type 
strain (89) and the cp45 vaccine strain grown in FRhL 

IS and Vero cells, as well as the deduced amino acid 

sequences of the RNA polymerase (L protein) of these 
HPIV-3 viruses, are set forth as follows with reference 
to the appropriate SEQ ID NOS. contained herein: 

20 Virus Nucleotide Sequence L Protein Sequence 

Wild-Type 

JS SEQ ID NO: 17 SEQ ID NO: 18 

Vaccine 

25 FRhL cp4S SEQ ID NO: 19 SEQ ID NO: 20 

Vero cp45 SEQ ID NO: 21 SEQ ID NO: 22 

Each PIV-3 virus genome listed above is 
15,462 nucleotides in length. Translation of the L 
30 gene starts with the codon at nucleotides 8646-8648; 

the translation stop codon is at nucleotides 15345- 
15347. The translated L protein is 2,233 amino acids 
long. 

As detailed in Example 2 and Table 6 therein 
35 below, based upon the differences between the wild- type 
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JS strain and the FRhL -grown cp 45 mutant vaccine 
strain, the key attenuating mutations for the HPIV-3 3' 
genomic promoter region are nucleotide 23 (T C) , 
nucleotide 24 <C -> T) , nucleotide 28 (G — > T) and 
5 nucleotide 45 (T ->A) (in antigenomic, message sense). 

As also detailed in Example 2 and Table 6 therein 
below, key attenuating sites for the L protein of HPIV- 
3 include the following: amino acid residues 942 
(tyrosine histidine) , 992 (leucine phenylalanine) 

10 and 1558 (threonine -> isoleucine) . 

In addition, the Vero-grown cp45 mutant 
vaccine strain contains an additional mutation 
resulting from a coding change in the L gene at amino 
acid residue 1292 (leucine -> phenylalanine) . 

15 It is understood that the nucleotide changes 

responsible for these amino acid changes are not 
limited to those set forth in Example 2 below; all 
changes in nucleotides which result in codons which are 
translated into these amino acids are within the scope 

20 of this invention. 

Human respiratory syncytial virus (RSV) is 
yet another nonsegmented, negative -sense, single 
stranded enveloped RNA virus. RSV belongs to the 
Subfamily Pneumovirinae and the genus Pneumo virus (see 

25 Table 1) . 

Two major subgroups of human RSV, designated 
A and B, have been identified based on reactivities of 
the P and G surface glycoproteins with monoclonal 
antibodies (62) . More recently, the A and B lineages 
30 of RSV strains have been confirmed by sequence analysis 

(63,64). Bovine, ovine, and caprine strains of this 
virus have also been isolated. The host specificity of 
the virus is most clearly associated with the G 
attachment protein, which is highly divergent between 
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the human and the bovine/ovine strains (65,66), and may 
be influenced, at least in part, by receptor binding. 

RSV is the primary cause of serious viral 
pneumonia and bronchiolitis in infants and young 
5 children. Serious disease, i.e., lower respiratory 

tract disease (LRD) , is most prevalent in infants less 
than six months of age. It most commonly occurs in the 
nonimmune infant's first exposure to RSV. RSV 
additionally is associated with asthma and 
10 hyperreactive airways and it is a significant cause of 

mortality in *high risk" children with bronchopulmonary 
dysplasia and congenital heart disease (CHD) . Xt is 
also one of the common viral respiratory infections 
predisposing to otitis media in children. In adults, 
15 RSV generally presents as uncomplicated upper 

respiratory illness; however, in the elderly it rivals 
influenza as a predisposing factor in the development 
of serious LRD, particularly bacterial bronchitis and 
pneumonia. Disease is always confined to the 
20 respiratory tract, except in the severely 

immunocompromised, where dissemination to other organs 
can occur. Virus is spread to others by fomites 
contaminated with virus -containing respiratory 
secretions, and infection initiates through the nasal, 
25 oral, or conjunctival mucosa. 

RSV disease is seasonal and virus is usually 
isolated only in the winter months, e.g., from November 
to April in northern latitudes. The virus is 
ubiquitous, and over 90% of children have been infected 
30 at least once by 2 years of age. Multiple strains 

cocirculate. There is no direct evidence of antigenic 
drift (such as that seen with influenza A viruses) , but 
sequence studies demonstrating accumulation of amino 
acid changes in the hypervariable regions of the G 
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protein and SH proteins suggest that immune pressure 
may drive virus evolution. 

In mouse and cotton rat models, both the F 
and G proteins of RSV elicit neutralizing antibodies 
and immunization with these proteins alone provides 
longterm protection against reinfection (67,68). 

In humans, complete immunity to RSV does not 
develop and reinfections occur throughout life (69,70); 
however, there is evidence that immune factors will 
protect against severe disease. A decrease in severity 
of disease is associated with two or more prior 
infections and there is evidence that children infected 
with one of the two major RSV subgroups may be somewhat 
protected against reinfection with the homologous 
subgroup (71), observations which suggest that a live 
attenuated virus vaccine may provide protection 
sufficient to prevent serious morbidity and mortality. 
Infection with RSV elicits both antibody and cell 
mediated immunity. Serum neutralizing antibody to the 
F and G proteins has been associated, in some studies, 
with protection from LRD, although reduction in upper 
respiratory disease (URD) has not been demonstrated. 
High levels of serum antibody in infants is associated 
with protection against LRD, and admins tration of 
intravenous immunoglobulin with high RSV neutralizing 
antibody titers has been shown to protect against 
severe disease in high risk children (70,72,73). The 
role of local immunity, and nasal antibody in 
particular, is being investigated. 

The RSV virion consists of a 
ribonucleoprotein core contained within a lipoprotein 
envelope. The virions of pneumoviruses are similar in 
size and shape to those of all other paramyxoviruses. 
When visualized by negative staining and electron 
microscopy, virions are irregular in shape and range in 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 PCT/US97/16718 

- 40 - 



diameter from 150-300 nm (74) . The nucleocapsid of 
this virus is a symmetrical helix similar to that of 
other paramyxoviruses, except that the helical diameter 
is 12-15 nm rather than 18nm. The envelope consists of 

5 a lipid bilayer that is derived from the host membrane 

and contains virally coded transmembrane surface 
glycoproteins. The viral glycoproteins mediate 
attachment and penetration and are organized separately 
into virion spikes* All members of paramyxovirus 

10 subfamily have hemagglutinating activity, but this 

function is not a defining feature for pneumoviruses, 
being absent in RSV but present in PVM (75) . 
Neuraminidase activity is present in members of the 
genera Paramyxovirus, Rubulavirus, and is absent in 

15 Morbillivirus and Pneumovirus of mice (PVM) (75) . 

RSV possesses two subgroups, designated A and 
B. The wild- type RSV (strain 2B) genome is a single 
strand of negative- sense RNA of 15,218 nucleotides (SEQ 
ID NO: 23) that are transcribed into ten major 

20 subgenomic mRNAs. Each of the ten mRNAs encodes a 

major polypeptide chain: Three are transmembrane 
surface proteins (6, F and SH) ; three are the proteins 
associated with genomic RNA to form the viral 
nucleocapsid (N, P and L) ; two are nonstructural 

25 proteins (NS1 and NS2) which accumulate in the infected 

cells but are also present in the virion in trace 
amounts and may play a role in regulating transcription 
and replication; one is the nonglycosylated virion 
matrix protein (M) ; and the last is M2, another 

30 nonglycosylated protein recently shown to be an RSV- 

specified transcription elongation factor (see Figure 
3) . These ten viral proteins account for nearly all of 
the viral coding capacity. 

The viral genome is encapsidated with the 

35 major nucleocapsid protein (N) , and is associated with 
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the phosphoprotein (P) , and the large (L) polymerase 
protein. These three proteins have been shown to be 
necessary and sufficient for directing RNA replication 
of cDNA encoded RSV minigenomes (76) . Further studies 
have shown that for transcription to proceed with full 
processing, the M2 protein (ORP 1) is required (74) . 
When the M2 protein is missing, truncated transcripts 
predominate, and rescue of the full length genome does 
not occur (74) . 

Both the M (matrix protein) and the M2 
proteins are internal virion-associated proteins that 
are not present in the nucleocapsid structure. By 
analogy with other nonsegmented negative -stranded RNA 
viruses, the M protein is thought to render the 
nucleocapsid transcriptionally inactive before 
packaging and to mediate its association with the viral 
envelope. The NS1 and NS2 proteins have only been 
detected in very small amounts in purified virions, and 
at this time are considered non- structural . Their 
functions are uncertain, though they may be regulators 
of transcription and replication. Three transmembrane 
surface glycoproteins are present in virions: G, P, and 
SH. G and F (fusion) are envelope glycoproteins that 
are known to mediate attachment and penetration of the 
virus into the host cell. In addition, these 
glycoproteins represent major independent immunogens 
(77) . The function of the SH protein is unknown, 
although a recent report has implicated its involvement 
in the fusion function of the virus (78) . 

The genomes of two wild- type RSV subgroup B 
strains (2B and 18537) have now been sequenced in their 
entirety (see SEQ ID NOS:23 and 25, discussed below). 
Genomic RNA is neither capped nor polyadenylated (79) . 
In both the virion and intracellularly, genomic RNA is 
tightly associated with the N protein. 
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The 3' end of the genomic RNA consists of a 
44 -nucleotide extragenic leader region that is presumed 
to contain the major viral promoter (Fig. 3). The 3' 
genomic promoter region is followed by ten viral genes 
5 in the order 3 • -NS1-NS2-N-P-M-SH-G-F-M2-L-5 1 (Fig. 3). 

The L gene is followed by a 145-149 nucleotide 
extragenic trailer region (see Figure 3) . Each gene 
begins with a conserved nine-nucleotide gene start 
signal 3 9 - GGGGC AAAU (except for the ten-nucleotide gene 
10 start signal of the L gene, which is 3 1 -GGGACAAAAU; 

differences underlined) . For each gene, transcription 
begins at the first nucleotide of the signal. Each 
gene terminates with a semi -conserved 12-14 nucleotide 
gene end (3' -A G U/G U/A ANNN U/A A 3 . 5 ) (where N can be 
15 any of the four bases) that directs transcription 

termination and polyadenylation (Fig. 3) . The first 
nine genes are non- overlapping and are separated by 
intergenic regions that range in size from 3 to 56 
nucleotides for RSV B strains (Fig. 3) . The intergenic 
20 regions do not contain any conserved motifs or any 

obvious features of secondary structure and have been 
shown to have no influence on the preceding and 
succeeding gene expression in a minreplicon system 
(Fig. 3) . The last two RSV genes overlap by 68 
25 nucleotides (Fig. 3). The gene-start signal of the L 

gene is located inside of, rather than after, the M2 
gene. This 68 nucleotide overlap sequence encodes the 
last 68 nucleotides of the M2 mRNA (exclusive of the 
Poly-A tail) , as well as the first 68 nucleotides of 
30 the L mRNA. 

Ten different species of subgenomic 
polyadenylated mRNAs and a number of polycistronic 
polyadenylated read- through transcripts are the 
products of genomic transcription (74) . 
35 Transcriptional mapping studies using UV light mediated 
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genomic inactivation showed that RSV genes are 
transcribed in their 3 • to 5' order from a single 
promoter near the 3* end (80). Thus, RSV synthesis 
appears to follow the single entry, sequential 
5 transcription model proposed for all Mononegavirales 

(16,81). According to this model, the polymerase (L) 
contacts genomic RNA in the nucleocapsid form at the 3' 
genomic promoter region and begins transcription at the 
first nucleotide. RSV mRNAs are co- linear copies of 
10 the genes, with no evidence of mRNA editing or 

splicing. 

Sequence analysis of intracellular RSV mRNAs 
showed that synthesis of each transcript begins at the 
first nucleotide of the gene start signal (74). The 5 1 

15 end of the mRNAs are capped with the structure 

m7G(5' )ppp(5' )Gp (where the underlined G is the first 
template nucleotide of the mRNA) and the mRNAs are 
polyadenylated at their 3' ends (82). Both of these 
modifications are thought to be made co- 

20 transcriptionally by the viral polymerase. Three 

regions of the RSV 3 1 genomic promoter have been found 
to be important as cis acting elements (83) . These 
regions are the first ten nucleotides (presumably 
acting as a promoter), nucleotides 21-25, and the gene 

25 start signal located at nucleotides 45-53 (83) . Unlike 

other Paramyxovirinae, such as measles, Sendai and PIV- 
3, the remainder of the leader and non- coding region of 
NS1 gene of RSV was found to be highly tolerant of 
insertions, deletions and substitutions (83). 

30 Additionally, by saturation mutagenesis 

(wherein each base is replaced independently by each of 
the other three bases and compared for translation and 
replication efficiencies) within the first 12 
nucleotides of the 3 1 genomic promoter region, a U- 

35 tract located at nucleotides 6-10 was shown to be 
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highly inhibitory to substitutions (83) ♦ In contrast, 
the first five nucleotides were relatively tolerant of 
a number of substitutions and two of them at position 
four were up-regulatory mutations, resulting in a four- 
5 to 20-fold increase in RSV-CAT RNA replication and 

transcription* Using a bi-cistronic minireplicon 
system, gene-start and gene-end motifs were shown to be 
signals for mRNA synthesis and appear to be self 
contained and largely independent of the nature of 

10 adjoining sequence (84) . 

The L gene start signal lies 68 nucleotides 
upstream of the M2 gene-end signal, resulting in gene 
overlap (Fig. 3) (74) . The presence of the M2 gene-end 
signal within the L gene results in a high frequency of 

IS premature termination of L gene transcripts* Full 

length L mRNA is much less abundant and is made when 
the polymerase fails to recognize the M2 gene- end 
motif. This results in much lower transcription of L 
mRNA. The gene overlap seems incompatible with a model 

20 of linear sequential transcription. It is not known 

whether the polymerase that exits the M2 gene jumps 
backward to the L gene- start signal or whether there is 
a second, internal promoter for L gene transcription 
(74) . It is also possible that the L gene is 

25 accessible by a small fraction of polymerases that fail 

to start transcription at the M2 gene- start signal and 
slide down the M2 gene to the L gene-start signal. 

The relative abundance of each RSV mRNA 
decreases with the distance of its gene from the 

30 promoter, presumably due to polymerase fall-off during 

sequential transcription (80) . Gene overlap is a 
second mechanism that reduces the synthesis of full 
length L mRNA. Also, certain mRNAs have features that 
might reduce the efficiency of translation. The 

35 initiation codon for SH mRNA is in a suboptimal Kozak 
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sequence context, while the G ORF begins at the second 
methionyl codon in the mRNA. 

RSV RNA replication is thought (74) to follow 
the model proposed from studies with vesicular 
5 stomatitis virus and Sendai virus (16,81). This 

involves a switch from the stop- start mode of mRNA 
synthesis to an ant i terminator read- through mode. This 
results in synthesis of positive sense replication- 
intermediate (RI) RNA that is an exact complementary 

10 copy of genomic RNA. This serves in turn as the 

template for the synthesis of progeny genomes. The 
mechanism involved in the switch to the antiterminator 
mode is proposed to involve cotranscriptional 
encapsidation of the nascent RNA by N protein (16,81). 

15 RNA replication in RSV like other nonsegmented 

negative -strand RNA viruses is dependent on ongoing 
protein synthesis (85) . Predicted RI RNA has been 
detected for the standard virus as well as RSV-CAT 
minigenome (74,85). RI RNA was 10-20 fold less 

20 abundant intracellularly than was the progeny genome 

both for the standard and the minigenome system. The 
nucleotide sequences (in positive strand, antigenomic, 
message sense} of various wild- type, vaccine and 
revertant RSV strains, as well as the deduced amino 

25 acid sequences of the RNA polymerase (L protein) of 

these RSV viruses, are set forth as follows with 
reference to the appropriate SEQ ID NOS. contained 
herein: 
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Virus 
Wild -Type 
2B 

18537 



Nucleotide Sequence L Protein Sequence 



SEQ ID NO: 23 
SEQ ID NO: 25 



SEQ ID NO: 24 
SEQ ID NO: 26 



Vaccine 

2B33F 

2B20L 



SEQ ID NO: 27 
SEQ ID NO:29 



SEQ ID NO: 28 
SEQ ID NO: 30 



Revertant 
2B33F TS(+) 
2B20L TS(+) 



SEQ ID NO: 31 
SEQ ID NO: 33 



SEQ ID NO: 32 
SEQ ID NO:34 



Each RSV virus genome encodes an L protein 
that is 2,166 amino acids long. Genome length and 
other nucleotide information is as follows: 



Virus 

Wild-Type 

2B 

18537 



Genome 
Length 
15218 
15229 



L Start Codon 
8502-8504 
8509-8511 



L Stop Codon 

15000-15002 

15007-15009 



Vaccine 

2B33F 

2B20L 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



Revertant 
2B33F TS(+) 
2B20L TS(+) 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



As detailed in Example 3 (especially Tables 7 
and 8) below, the key attenuating mutations for the RSV 
subgroup B 3 1 genomic promoter region are nucleotide 4 
(C — > G) , and the insertion of an additional A in the 
stretch of A's at nucleotides 6-11 (in antigenomic 
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message sense) . As also detailed in Example 3 below, 
the key potentially attenuating sites for the L protein 
of RSV are as follows: amino acid residues 353 
(arginine -> lysine), 451 (lysine -> arginine) , 1229 
(aspartic acid -> asparagine) , 2029 (threonine -> 
isoleucine) and 2050 (asparagine -» aspartic acid) . It 
is understood that the nucleotide changes responsible 
for these amino acid changes. are not limited to those 
set forth in Example 3 below; all changes in 
nucleotides which result in codons which are translated 
into these amino acids are within the scope of this 
invention. 

The attenuated viruses of this invention 
exhibit a substantial reduction of virulence compared 
to wild- type viruses which infect human and animal 
hosts. The extent of attenuation is such that symptoms 
of infection will not arise in most immunized 
individuals, but the virus will retain sufficient 
replication competence to be infectious in and elicit 
the desired immune response profile in the vaccinee. 

The attenuated viruses of this invention may 
be used to formulate a vaccine. To do so, the 
attenuated virus is adjusted to an appropriate 
concentration and formulated with any suitable vaccine 
adjuvant, diluent or carrier. Physiologically 
acceptable media may be used as carriers. These 
include, but are not limited to: an appropriate 
isotonic medium, phosphate buffered saline and the 
like. Suitable adjuvants include, but are not limited 
to MPL™ (3-O-deacylated monophosphoryl lipid A; RIBI 
ImmunoChem Research, Inc., Hamilton, MT) and IL-12 
(Genetics Institute, Cambridge, MA) . 

In one embodiment of this invention, the 
formulation including the attenuated virus is intended 
for use as a vaccine. The attenuated virus may be mixed 
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with cryoprotective additives or stabilizers such as 
proteins (e.g., albumin, gelatin), sugars (e.g., 
sucrose, lactose, sorbitol), amino acids (e.g., sodium 
glutamate) , saline, or other protective agents. This 
5 mixture is maintained in a liquid state, or is then 

dessicated or lyophilized for transport and storage and 
mixed with water immediately prior to administration. 

Formulations comprising the attenuated 
viruses of this invention are useful to immunize a 
10 human or animal subject to induce protection against 

infection by the wild- type counterpart of the 
attenuated virus. Thus, this invention further 
provides a method of immunizing a subject to induce 
protection against infection by an RNA virus of the 
15 Order Mononegavirales by administering to the subject 

an effective immunizing amount of a vaccine formulation 
incorporating an attenuated version of that virus as 
described hereinabove. 

A sufficient amount of the vaccine in an 
20 appropriate number of doses must be administered to the 

subject to elicit an immune response. Persons skilled 
in the art will readily be able to determine such 
amounts and dosages. Administration may be by any 
conventional effective form, such as intranasally, 
25 parenterally, orally, or topically applied to any 

mucosal surface such as intranasal, oral, eye, vaginal 
or rectal surface, such as by an aerosol spray. The 
preferred means of administration is by intranasal 
administration . 

30 In another embodiment of this invention, an 

isolated nucleic acid molecule having the complete 
viral nucleotide sequence of either the wild- type 
viruses or vaccine viruses described herein is used to 
generate oligonucleotide probes (from either positive 

35 strand antigenomic message sense or negative strand 
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complementary genomic sense) and to express peptides 
(from positive strand antigenomic message sense only), 
which are used to detect the presence of those wild- 
type virus and/or vaccine strains in samples of body 
fluids and tissues. The nucleotide sequences are used 
to design highly specific and sensitive diagnostic 
tests to detect the presence of the virus in a sample. 

Polymerase chain reaction (PCR) primers are 
synthesized with sequences based on the viral wild- type 
or vaccine sequences described herein. The test sample 
is subjected to reverse transcription of RNA, followed 
by PCR amplification of selected cDNA regions 
corresponding to the nucleotide sequence described 
herein which have nucleotides which are distinct for a 
defined strain of virus. Amplified PCR products are 
identified on gels and their specificity confirmed by 
hybridization with specific nucleotide probes. 

ELISA tests are used to detect the presence 
of antigens of the wild- type or vaccine viral strains. 
Peptides are designed and selected to contain one or 
more distinct residues based on the wild- type or 
vaccine sequences described herein. These peptides are 
then coupled to a hapten (e.g., keyhole limpet 
hemocyanin (KLH) and used to immunize animals (e.g., 
rabbits) for the production of monospecific polyclonal 
antibody. A selection of these polyclonal antibodies, 
or a combination of polyclonal and monoclonal 
antibodies can then be used in a "capture ELISA" to 
detect antigens produced by those viruses. 

Samples of the Moraten measles virus vaccine 
strain were deposited by Applicants with the American 
Type Culture Collection, 12301 Parklawn Drive, 
Rockville, Maryland 20852, U.S.A., under the provisions 
of the Budapest Treaty for the Deposit of 
Microorganisms for the Purposes of Patent Procedures 
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("Budapest Treaty") and have been assigned ATCC 
accession number VR2587. Samples of the HPIV-3 virus 
Vero-grown cp45 vaccine strain were deposited by 
Applicants with the American Type Culture Collection, 
5 12301 Parklawn Drive, Rockville, Maryland 20852, 

U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2588. 
Samples of the 2B wild- type RSV virus were deposited by 
Applicants with the American Type Culture Collection, 

10 12301 Parklawn Drive, Rockville, Maryland 20852, 

U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2 586. 

Given these three deposited strains and the 
sequence information for these and other strains 

IS provided herein, one can use site-directed mutagenesis 

and rescue techniques described above to introduce 
mutations (or restore a wild- type genotype) of all the 
strains described herein, as well as taking these 
strains and making additional mutations from the panel 

20 of mutations set forth in Tables 3, 4 and €-8 below. 

In order that this invention may be better 
understood, the following examples are set forth. The 
examples are for the purpose of illustration only and 
are not to be construed as limiting the scope of the 

25 invention. 

Examples 

Standard molecular biology techniques are 
30 utilized according to the protocols described in 

Sambrook et al. (86). 
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Example 1 
Measles 

Moraten MV vaccine virus was grown once, 
directly from the Attenuvax™ vaccine vial (Lot #0716B) , 
the Schwarz vaccine virus was grown once (Lot 
96G04/M179 G41D) , while the Zagreb and Rubeovax™ 
vaccine viruses were each grown twice in the Vero cells 
before RNAs were made for sequence analysis. MV 
wildtype isolate Montefiore (56) was passed 5-6 times 
in Vero cells before extraction of RNA materials and 
similarly, MV wildtype isolates 1977 , 1963 (14) were 
grown 5-7 times before extracting materials for 
analysis. Edmonston wild- type isolate received from 
Dr. J. Beeler (CBER) (see Pig. 1) was the original 
Edmonston isolate already passaged seven times in human 
kidney cells and three times in Vero cells before 
receipt and further passaged once in Vero cells before 
using for sequence analysis. 

RNA was prepared by infecting Vero cells at a 
multiplicity of infection (m.o.i.) of 0.1 to 1.0 and 
allowed to reach maximum cytopathology before being 
harvested. Total RNA from measles virus- infected cells 
was extracted using Trizol™ reagent (Gibco-BRL) . 

The total RNA isolated from Vero cell passage 
material was amplified by the Reverse Transcriptase-PCR 
(Perkin-Elmer/Cetus) procedure using measles (Edmonston 
B strain (19)) specific primer pairs spanning the 3 1 
and 5 1 promoter regions and the L gene of the viral 
genome. Table 2 presents these primer sequences. The 
primers of SEQ ID NOS.-35-54, 74, 77 and 78 are in 
antigenomic message sense. The primers of SEQ ID 
NOS:55-73, 75 f 76 and 79 are in genomic negative- sense. 
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Table 2 

Primers for PCR and Sequencing MV L Genes 
and Genomic Termini 



S047 CATATCACTCACTCTGGGATGGAG 9070 


(SEQ 


ID 


NO: 


35) 


9J71 TCAGAACATCAAGCACCGCC,„ 0 


(SEQ 


ID 


NOs 


36) 


97 4 x ACAGTC AAGACTGAGATGAG 97 60 


(SEQ 


ID 


NO: 


37) 


10001 AAGAGTCAGATACATGTGGA 10020 


(SEQ 


ID 


NO 


:38) 


103S1 ACATGAATCAGCCTAAAGTC 10370 


(SEQ 


ID 


NO 


:39) 


1M74 CCGAAAGAGTTCCTGCGTTACGACC l0<99 


(SEQ 


ID 


NO 


;40) 


U083 CAGTCCACACAAGTACCAGG 1U02 


(SEQ 


ID 


NO 


s41) 


114S1 GTCAGAAGCTGTGGACCATC 11480 


(SEQ 


ID 


NO 


:42) 


11841 AATATTGCTACAACAATGGC 11860 

lllli All"" 


(SEQ 


ID 


NO 


:43) 


i2i96ACTCTTCATTCCTAGACTGG 1S2 i 5 


(SEQ 


ID 


NO 


:44) 


12S42 GTCCAATTATGACTATGAAC X25<1 


(SEQ 


ID 


NO 


:45) 


12891 AGAACAGACATGAAGCTTGC 12910 


(SEQ 


ID 


NO 


:46) 


11232 CCAACAAGGAATGCTTCTAG lJ251 


(SEQ 


ID 


NO 


s47) 


13551 ACAGCACTATCTATGATTGACCTGG 13575 


(SEQ 


ID 


NO 


:48) 


13930 GCAACATGGTTTACACATGC 13949 


(SEQ 


ID 


NO 


:49) 


14280 AGATTGAGAGTTGATCCAGG 14299 


(SEQ 


ID 


NO 


:50) 


14629 AGGAGATACTTAAACTAAGC 14648 


(SEQ 


ID 


NO 


:51) 


14981 TAAGCTTATGCCTTTCAGCG 15000 


(SEQ 


ID 


NO 


:52) 


15337 TTAACGGACCTAAGCTGTGC 15356 


(SEQ 


ID 


NO 


:53) 


lS671 GAAACAGATTATTATGACGG 15690 


(SEQ 


ID 


NO 


:54) 


9290 CGGGCTATCTAGGTGAACTTCAGG 9267 


(SEQ 


ID 


NO 


:55) 


9500 ATTTGGATATGGAATATGAG 9481 


(SEQ 


ID 


NO' 


:56) 


9840 ACTCAACTGAACTACCAGTG 9821 


(SEQ 


ID 


NO, 


:57) 


10181 AAGAACATCATGTATTTCAG 10162 


(SEQ 


ID 


NO 


:58) 


l0549 TTATCAACGCACTGCTCATG 10530 


(SEQ 


ID 


NO 


:59) 


109l9 ATTTTCAGCAATCACTTGGCATGCC 10895 


(SEQ 


ID 


NO 


:60) 


ii2BoGCCTCTGTGCAAACAAGCTG 11261 


(SEQ 


ID 


NO 


:61) 


11638 TCTCTAGTTACTCTAGCAGC 116X9 


(SEQ 


ID 


NO 


:62) 


12010 AGGTCGTTGTTTGTGAGGAG 11991 


(SEQ 


ID 


NO 


:63) 


12361 TCGTCCTCTTCTTTACTGTC 12342 


(SEQ 


ID 


NO 


:64) 
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i2«a9CCGTCCTCGAGCTAGCCTCG 12670 (SEQ ID NO: 65) 

13052 CTCCTCCAGGCTCACATTGG 13033 (SEQ ID NO: 66) 

13420 GGGTTGGTACATAGCTCTGC 13401 (SEQ ID NO: 67) 

13767 CACCCATCTGATATTTCCCTGATGG 13743 (SEQ ID NO: 68) 

14099 TGGTTGACAGTACAAATCTG 14080 (SEQ ID NO: 69) 

14460 CTGAAATGGGAAGATTGTGC 14441 (SEQ ID NO: 70) 

14820 AGCAATCTACACTGCCTACC 14B01 (SEQ ID NO:71) 

15180 TCACAGATGATTCAATTATC 15161 (SEQ ID NO: 72) 

15530 GATCCTAGATATAAGTTCTC 15511 (SEQ ID NO: 73) 

i AC C AAACAAAGTT GGGTAAGG 2 x (SEQ ID NO: 74) 

GGGGGATCC 10O ATCCCTAATCCTGCTCTTGTCCC 78 (SEQ ID NO: 75) 

200 GATTCCTCTGATGGCTCCAC 181 (SEQ ID NO: 76) 

15721 TAACAGTCAAGGAGACCAAAG 15741 (SEQ ID NO: 77) 

GGGAAGCTT 15801 AACCCTAATCCTGCCCTAGGTGG 15823 (SEQ ID NO: 78) 

15894 ACCAGACAAAGCTGGGAATAGA 15873 (SEQ ID NO: 79) 

Overlapping PCR fragments of the complete 
viral genome were directly sequenced without cloning to 
achieve the consensus sequence, by the dideoxy 
terminator cycle sequencing method using both strands 
(ABI PRISM 377 sequencer and ABI PRISM sequencing Kit) . 
To determine the sequence at the absolute termini, a 
ligation procedure described previously was used (55) . 

To test this hypothesis, the nucleotide 
sequences were determined for the non-protein coding 
regulatory regions and the h gene of the progenitor 
Edmonston wild- type MV isolate, for the available 
vaccine strains derived from this isolate, as well as 
for other wild- type strains. Nucleotide (in 
antigenomic, message sense) and amino acid differences 
were then compared and aligned as set forth in Tables 
3-5 (differences are in italics) : 
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Table 3 

Differences in MV 3' Genomic Promoter Region 
Nucleotide Sequence 



Nucleotide number: 
Virus 26 42 50 96 

Edmonston w-t A A G G 



Vaccines : 

Rubeovax™ T C G G 

Moraten T C G G 

Schwarz T C G G 

Zagreb T T G A 

AIK-C T C G G 



Wild-Types: 

1977 

1983 

Montef iore 



A A A G 
A A A G 
A A A G 
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Table 4 

Differences in MV L Nucleotides and Amino Acids 
Between Edmonston Wild-Type and Vaccine Strains 

331 1409 1624 1649 1717 1887 1936 2074 2114 

Edmonston w-t ATT GCA ACC AGG GAT AAC CAT CAA AGA 
Mutation ACT ACA GCC ATG GCT GAC TAT CGA AAA 



Edmonston w-t I 

Rubeovax™ vac. I 

Moraten vac. T 

Schwarz vac. T 

Zagreb vac. I 

AIK-C vac. I 



A T R D 

A T Af A 

A T M A 

A T M A 

T T R A 

T A R A 



N 


H 


Q 


R 


D 


H 


Q 


R 


D 


H 


Q 


K 


D 


H 


Q 


K 


N 


H 


Q 


R 


N 


Y 


R 


R 
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Example 2 
PIV-3 

A comparison of sequences (in antigenomic 
message sense) of the parental wild- type JS strain of 
PIV-3 virus and the FRhL-grown and Vero-grown forms of 
the cp45 mutant are set forth in Table 6, Where a 
codon change does not result in an amino acid change, 
Table 6 states "none", followed by the name of the 
unchanged amino acid. 
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Sequence analysis of the parental wild- type 
JS strain of PIV-3 virus and the FRhL-grown cp45 mutant 
showed that the latter contained 20 nucleotide changes. 
Four changes were in the noncoding 3 '-leader region at 
5 nucleotide positions 23 (T -> C) , 24 (C -> T) , 28 (G -» 

T) and 45 (T -> A) (in antigenomic , message sense) . 
When considered in the genomic, negative sense, the 
change at position 28 from the smaller pyrimidine (*C") 
to the larger purine ( n A") may change the size of the 

10 region flanked by the conserved regions of the 3* 

genomic promoter region, resulting in an altered 
spatial presentation of the cis -acting signals to the 
polymerase. 

Nine changes were coding changes in the NP, 

IS M, F, HN and L genes. The other seven changes were 

non-coding or silent changes in the NP, P, F, HN and I* 
genes or the NP untranslated region (UTR) . The cp45 
mutant has been demonstrated to have poor transcription 
activity at non-permissive temperatures due to its ta 

20 phenotype (87) . This ts phenotype has now been mapped 

to the viral L gene (88) . Because the cp45 virus has 
been shown to function normally with regard to 
mutations in the HN and F glycoproteins (87) , this 
supports the implication that mutations in the 3 1 - 

25 leader and L gene contributed to the attenuating 

phenotype of this virus. 

Thus, the four 3' leader specific changes in 
FRhL-grown cp45 and the three coding changes in the L 
gene at amino acid positions 942 (Tyr -»His), 992 (Leu 

30 Phe) and 1558 (Thr -> lie) contributed significantly 

to the attenuation phenotype of the candidate cp45 
vaccine strain. 

Furthermore, the Vero- grown cp45 mutant 
vaccine strain contains an additional mutation 

35 resulting from a coding change in the L gene (marked 
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with an asterisk in Table 6) at amino acid residue 1292 
(leucine -» phenylalanine) . 

The first two amino acid changes in the L 
protein (at positions 942 and 992) map to one of the 
5 highly conserved areas among all Paramyxovirus L genes. 

The fourth amino acid change (at position 155B) maps to 
the area joining two conserved blocks corresponding to 
the change at amino acid 1717 in the MV vaccine 
strains . 

10 The published literature (89) sets forth only 

18 changes between the antigenomic message sense 
sequences of the JS and FRhL-grown cp45 strains. 
Sixteen of these changes were found by applicants. 

The published literature did not report four 

IS changes found by applicants: in the 3' leader at 

nucleotide 45 (T A) , in the NP UTR at nucleotide 62 
(A -> T) , or the changes in amino acids in the NP 
protein resulting from the changes at nucleotide 397 (T 
-» C) , leading to the amino acid change (Val ->Ala) and 

20 nucleotide 1275 (T — ► G) , leading to the amino acid 

change (Ser ->Ala) (nucleotide changes in antigenomic, 
message sense) . Nor did the published literature 
report the additional potentially attenuating mutation 
in the L protein found by applicants in the Vero-grown 

25 cp45 strain resulting from the change at nucleotide 

12521 (A -> T) , leading to the change in amino acid 
1292 (Leu -> Phe) . 
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Example 3 
RSV Subgroup B 

The temperature -sensitive (ts) phenotype is 
strongly associated with attenuation in vivo; in 
addition, some non-ts mutations may also be 
attenuating. Identification of ts and non-ts 
attenuating mutations was achieved by sequence analysis 
and evaluation of ts, cold-adapted (ca) , and in vivo 
growth pheno types of RSV mutants and revertants. 

The genomes of the following five RSV 2B 
strains have now been completely sequenced: 2B parent, 
2B33F, one revertant designated 2B33F TS(+), 2B20L and 
one revertant designated 2B20L TS(+). The 2B33F and 
2B20L strains are ts and ca and are described in U.S. 
Serial No. 08/059,444 (90), which is hereby 
incorporated by reference. After identifying regions 
where mutations in 2B33F and 2B20L are located, nine 
additional isolates of 2B33F "revertants" obtained 
following in vitro passaging at 39°C and in vivo 
passaging in African Green Monkeys or chimpanzees, and 
nine additional isolates of 2B20L "revertants" obtained 
following in vitro passaging at 3 9°C have been 
sequenced in those regions. The ts, ca, and 
attenuation phenotypes of many of these revertants have 
now been characterized and assessed. Correlations 
between phenotype ts, vaccine attenuation and sequence 
changes have been identified. 

A summary of results is presented in Tables 

7-12. 
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Table 7 

Sequence comparison between RSV 2B and 2B33F strains 





L— — 

II Nucl. 


Nucleotide 


changes 






jj pos . t 










Gene/ 


3 1 end 


RSV 2B 


RSV 


RSV 2B33F 


Amino acid 


region 


of vRNA 




2B33F 


JCI 
















Genomic 


4 


c 

V* 




G 


non- coding 1 


| Promoter 


6 






extra A 


non- coding I 


M 


4175 


T 


c 




non - c oding | 




4199 


T 


r 


c 


non- coding 


GU 

on 




T 


C 


c 


Pne-Leu (10) 






T 


c 


c 


none lie (Jo) 


I 


AA O A 


T 


c 


c 


Ile-Thr (40) 




A A AO 


T 


p 


c 


none His (47) 




4454 


T 


c 


c 


none Cys (51) 




4484 


T 


c 


c 


none Tyr (61) 




4497 


T 


c 


C 1 


Stop-Gin (66) 




4505 


T 


c 


c 


none Ser (68) 




4525 


T 


c 


c 


Ile-Thr (75) 




4526 


T 


c 


c 


Ile-Thr (75) 




4542 


T 


c 


c 


Stop-Gin (81) 




4561 


T 


c 


c 


Leu-Pro (87) 




4575 


T 


c 


c 


Trp-Arg (92) 




4598 


T 


c 


c 


none Thr (99) 


L 


9559 


6 


A 


A 


Arg-Lye (353) 




9853* 


A 


G 


A 


Lys-Arg (451)* 




12186 


G 


A 


A 


Asp-Asn (1229) | 




14587 


C 


T 


T 


Thr-Ile (2029) | 


-- 


15071 


A 


G 


G 


non- coding 



t For 2B33F and 2B33F TS(+), nucl. pos. numbers 

are one larger than for 2B for M, SH & L genes 

* At pos. 9853 , the Lys-Arg change has reverted 

back to Lys in. the 2B33F TS( + ) strain 
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Table 8 

Sequence comparison, between RSV 2B and 2B20L strains 





Nucl- 


Nucleotide changes 


1 

i 




pos . t 










9 Gene/ 


3 * end 


RSV 2B 


RSV 


RSV 2B20L 


Amino acid 


1 region 


of vRNA 




2B20L 


TS(+), Rl 


changes | 










revertant 




Genomic 


4 


C 


G 


G 


non- coding* 


Promoter 


6 




extra A 


extra A 


non- coding* 


L 


8963 


C 


T 


T 


none Thr (154) 




13347 


A 


A 


G 


Asn-Asp (1616) 




14587 


C 


T 


T 


Thr-Ile(2029)* 




14649 


A 


G 


G 


Asn-Asp (2050) 




14650 


A 


A 


T 


Asn-Asp-Val 












(2050)** 



t For 2B20L and 2B20L TS(+), nucl. pos . numbers 

are one larger than for 2B for L gene 
* Mutation is common in 2B33F and 2B20L strains 

** At pos. 14650 , the mutation suppresses the ts 

phenotype in 2B20L TS(+) revertant 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 

- 65 - 



to 

■H 

cd 

CO 



4J 

u 

CD 

> 

OS 



CN 

> 

CO 




ES 
0 

s 

■u 



u 

d 
o 
w 



I 

0) 



9. x 



o & 



• \ 
o o 
V 



in n f 



f-» H O 
V V w 



VI V ^ 



o 
o 



o ^ 



o o 

O «H 

O \ 

o a 

• a 



a 

i -H 
M 

* CO 



1 0 TJ 

4J CP 

htf en 

H « 
O B 

« n 

(•H (8 
1 41 ? 

92 

4J 0 



« U 

0> o 

(0 a> 

a ro 
a 

d *> 
Oi « 

u «d 

at o 

a -s 

CO 

fa 5 

09 rH 

cm a 



fa 

CM ^ 

> -i 



g 



H CO 



o ^ 



* U 
CP) o 

* o> 
ta m 
a 

id u 

Ck « 

H *0 

a-s 

^ -H 

a a. 
n 

& 

a h 

CM Oi 



CM 



cm r*i 

VI — 



g 



o — 



fa 

m 

CM 

♦ 

> - 
to OS 



tn o 
* o\ 
a m 

CO 

O* to 
n *d 

an 

o, a 

to 

Dm 3 
m u* 
ro id 
0) rH 

cm a 



fa 

CO 

ro 

m 

CM — * 

>-i 

03 M 
P4 H 



• CO 



u 

o 

* S " 

0 TJ 
4J H V 
O « * 

® a u 
a c a 
» i> a> 

fa U 3 

f*> » & 

f> CM (d 
0 i< -I 

« * a 



CM 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCMJS97/16718 

- 66 - 



CO 

fl 

•rl 

(d 
u 

■u 
co 



4J 



4J 

M 
0) ^ 



0) 



•9 



m 

> 

CO 
PS 



g 



X! 

» 

o 

s 

4J 
O 

u 



S m 

o 

g 

Pi 

2 



-H 1 

n 



« 

3 



■© 

CI H 

ro a) 
^ -H 



§ E- 
" I 

C | 
m © 

Pi 



g 



H Pi 
• (0 

o 



S 43 ™ 

* * - 

0) •© 

4i H O 

u a x 

(POO 

»m © t-i 

a c q» 

i _ 

ro * «• 

ro d © 

m < »H 

(1 * Oi 



ft 

Pi 



g 



g 



rH O 

V — 



VI — 



o 
o 



g 



iH O 



o 
o 



IN 

« Ml 

*J © © 

0 a * 
© © o 

1 iH © 

ro * b* 
ro © 

N « ft 



o a 

O -rl 

O ^ 

O Pi 



so 

8* 

Pi 



© iH •© 

*J id © 

0 to 44 
© © O 

"S fl U 

1 H © 

m * b* 

ro ^* © 

a 4 ih 

n * Pi 



u © 

•o u 

© © 

U iJ 

© 

a -0 



to (N 

ro in 
(Q H 



4J 
© 

© 
M 

U 

■rl 

Pi 

?& 

> « 

© iH 
H Pi 



Pi 
Pi 



g 



ro to 

VI w 



cs ro 

VI w 



43 « 

U 0 
43 

•O O 

© © 

u u 
© 



to O 

ro vr> 

ro m 

m r-» 



© 0 

?& 
6.3 

«H Pi 



1 
6 



1* 

> M 

© ro 
43 

O H ** 

« « 

•a © _ 

© © *o 

U C 9 

U M 

© O U 

W H -H 

C U Pi 
•rl 

to ro 3 

ro «o O* 

ro in © 

« H H 

c« 4fc Pi 



in 

& 

•H 

e 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



CO 

a 

•H 

id 
u 

10 
*i 

M 
> 



§ 

O 



-1 



4J 



•9 

Eh 



cq 

CM 
> 









- 6; 


7 - 






I - 


Bronchial 
Lavage 


• \ 
o o 


a 


e 

ss 


P 


P 

2 


n 

C5 


Nasal 
Wash 


• \ 
o o 
v 


g 


g 


§ 


g 


n Vivo 
on Rat 


CO 
D> 


T» 

fO ^ 
• "V, 
iH O 
V w 


m \ 
• *r 

ro *-» 


• 

CI ~ 


O 

• 


w * 1 
C| \ II 

• * 

m w I 


u 


Nasal 
turbinates 


• \ 
iH O 

V — * 


u 

* 


u — »» 

in 

• \ 
n m 

VI — 


w —» 
CI 
• 

ci m 

VI ^ 


ti ? 
• 

CI — 


tienotype 
ca 


o Tl 

m q 


M 

o 
o 


g 


g 


§ 


s 


Xn Vitro P] 
ts 


* p. 

i 

o H 

0 m 

" 1 

o\ J? 
CO *"J 
Pi 


0.0002 
(int/wt) 


o ~ 


o — * 


o — 


© — 


Source 




ca, ts mutant 
isolated from 2B 
cold-passaged x 20 


2B20L spinner 
passage, plaque 
picked at 39°C | 


2B20L spinner 
passage, plaque 

picked at 39°C 


• id o> 

C Hi 
■H Xi 
Oi * * 

Dt V 

•a m 9 

<=> m M 
ci a o 

CO (d «H 

<n a ft 


2B20L spinner 
passage, plaque 
picked at 39°C 


0 * 






rl i 

as 


N < 
X 




o ! 

H 
X 


B to 


1 
< 

< 

i 
I 

1 


4 , 
o < 
m < 
a i 

N < 
> 

a 


4 i 

O 1 

*« i 
a i 

+ 


J I 

9 < 

N 1 

a i 

+ 

a co 

* £-» 


3 < 
< 

a i 

+ 

n to i 

* H < 


A 

o 

a 

N — 
+ 

> ~ 

» to 



9 



9 

4J 
QJ 

•H 

r 

s 

5 



0 



Ij o * 55 
H H 



D D 
5* Du 
$ 0. 0. 

- H „ c 

e o o 

* B « O 

W ii ■ » 

»• Is s 

© ft w 

41 B u m 



3 • ft i 

r ii a S5 

Jl -3 M M 
CO 



0 S 



§ ?3 



30,22 

O © . M 



- ^ 

•O iH 

e *h 

Q U 



§ s s 



o o 

Q Q 



o 

ot d 

o 

O TJ 

> o 

- C 0 0 

H 11 0 0 

Q ° Q 



o o 



■ u 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 68 - 



Table 10 
2B33F Revertants 





ts U) 


In 


vitro 


AGM 


Chimp 




5a 


4a 


3b 


pp2 


pp4 


pp6 


pp7 


1A 


3A 


5A 


base no.t 








M 






















4176,4200 


S 


s 


S 


S 


S 


S 


S 


S 


S 


S 


SH 






















14 bases* 


S 


s 


S 


s 


3 


s 


S 


S 


s 




L 






















9560 


s 


s 


S 


s 


S 


s 


S 


s 


s 


S 


9854 


2B 


2B 


2B 


2B 


9 


s 


S 


ND 


2B 


2B 


12187 


S 


S 


S 


S 


S 


s 


S 


S 


S 


S 


14586 


S 


s 


S 


S 


S 


s 


S 


ND 


S 


S 


15072 


S 


s 


s 


S 


S 


s 


S 


S 


s 


s 


A Phenotype 










28 


2B 


2B 


r 


r 


s 


S 


2B 


2B 


2B 




S 


S 


S 


2B 


S 


2B 


s 


ND 


ND 


ND 


Attenuated 


r 


r 


r 


(r) 


<r) 


s 


s 


ND 


r 


T 



t These 2B33F revertant base nos. are one larger than for 2B for M, 
SH and L genes 

* bases 4330,4410,4421,4443,4455,4485,4498,4506,4526,4527,4543, 

4562,4576,4599 
S a same base as 2B33F 

2B o reversion to 2B base or complete reversion in phenotype 
r = moderate reversion in phenotype 
(r) a slight reversion in phenotype 
ND a not done 
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Table 11 
2B20L Revertants 











TS( + ) 


In vitro Isolates 






base no. t 


Rl 


R2 


R3A 


R4A 


R5A 


R6A 


R7A 


R8A 


R9A 


R10A 


L 






















8964 


S 


S 


S 


S 


S 


S 


S 


S 


S 


S 


13348 


C* 


S 


ND 


S 


s 


ND 


S 


S 


S 


S 


14588 


S 


s 


S 


S 


s 


S 


S 


S 


S 


S 


14650 


S 


s 


2B 


S 


2B 


2B 


S 


S 


2B 


2B 


14651 


A* 


A* 


S 


A* 


S 


S 


A* 


A* 


S 


S 


Phenotype 




ts 


2B 


2B 


ND 


ND 


ND 


ND 


ND 


ND 


2B 


2B 


Attenuated 


r 


r 


ND 


ND 


ND 


ND 


ND 


ND 


r 


r 



t These 2B20L revertant base nos. are one larger than for 2B for L 
genes 

S a same base as 2B20L 

2B s reversion to 2B base 

r = moderate reversion in phenotype 

* = base change, different from 2B or 2B20L 

ND m not done 
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Table 12 

RSV 2B, ts and Revertant Strains: Phenotype Summary 



Virus Isolate 


Source 


In Vitro 
Phenotype 
ts ca 


In Vivo 
Attenuation U 
Cotton AGM 1 
Rat 1 


RSV 2B 


Wild- type Parent Strain 










RSV 2B33P 


ca, ts mutant isolated 
from 2B, cold-passaged 
x 33 


♦+ 




... 


RSV 2B33P - 5a 
TS( + ) 


2B33P spinner passage 
plaque picked at 39°C 






♦+ 


+ 


IRSV 2B33F - 4a 

psu) 


2B33F spinner passage 
plaque picked at 39°C 






++ 


ND 


IrSV 2B33F - 3b 
|TS(+) 


2B33F spinner passage 
plaque picked at 39°C 


- 




♦+ 


ND 


AGM pp2 


2B33F-infected AGM A2 , 
d7 nasal wash plaque 
picked at 32°C 


+ 




+♦+ 


ND 


AGM pp4 


2B33F- infected AGM A2 , 
d7 nasal wash plaque 
picked at 32°C 


+ 




♦++ 


ND 


AGM pp6 


2B33F- infected AGM A4 , 
d!2 nasal wash plaque 


++++ 






ND 


AGM pp7 


2B33F- infected AGM A4, 
dl2 nasal wash plaque 
picked at 32°C 


+♦++ 


♦ + 




ND 


Chimp pplA 


2B33F- infected chimp 
#1552, d4 tracheal 
lavage, plaque picked 
at 32°C 




ND 


ND 


ND 


Chimp pp3A 


2B33F-inf ected chimp 
#1560, dS tracheal 
lavage, plaque picked 
at 32°C 




ND 


++ 


ND 


Chimp ppSA 


2B33F- infected chimp 
#1563, dlO tracheal 
lavage, plaque picked 
at 32°C 




ND 




ND 
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Table 12 (continued) 
RSV 2B, ts and Revertant Strains: Phenotype Summary 



Virus Isolate 


Source 


In Vitro 
Phenotype 
ts ca 


In Vivo ^ 
Attenuation | 
Cotton AGM | 
Rat | 


RSV 2B20L 


ca, ta mutant isolated 
from 2B, cold-passaged 
x 20 


++++ 






+++ + 


RSV 2B20L Rl 
TS( + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 


+♦ 


ND 


RSV 2B20L R2 
TS( + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 




ND 


9 RSV 2B2 0L R9 
TS{ + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 




ND 


RSV 2B20L RIO 
TS( + > 


2B20L spinner passage 
plaque picked at 39°C 




ND 




ND 



ND a not done 

- - wild-type phenotype, i.e., not temperature sensitive, not cold 

adapted, not attenuated 
+ to ♦+++ = increasing levels of temperature sensitivity, cold- 
adaptation or attenuation 
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Several significant observations can be drawn 
from these data: 

a. As shown in Tables 7 (for 2B33F) and 8 (for 
5 2B20L) , there are relatively few sequence changes 

identified in the two mutant strains: RSV 2B33F 
differs from parental RSV 2B by two changes at the 3 1 
genomic promoter region, two changes at the non-coding 
5'-end of the M gene, and four coding changes plus one 

10 non-coding (poly(A) motif) change in the RNA dependent 

RNA polymerase coding L gene. In addition, 14 changes 
mapped to the SH gene alone. RSV 2B20L differs from 
its RSV 2B parent only at seven nucleotide positions, 
of which three are common with 2B33F virus, including 

15 two changes at the 3' genomic promoter and one coding 

change in the L gene. Two additional unique changes of 
2B20L virus mapped to the coding region of the I* gene. 
Potentially attenuating mutations at the non- coding 3' 
genomic promoter region and the RNA dependent RNA 

20 polymerase gene have been identified. 

b. Two ts mutations can be identified in the L 
gene of the attenuated virus strains 2B33F and 2B20L: 

25 (i) In 2B33F, a mutation at nucleotide position 

9853 (A ->G) leading to a coding change in L protein 
at amino acid 451 (Lys ->Arg) is clearly associated 
with the ts and attenuation phenotypes. Reversion at 
this site alone in the 2B33F TS(+) 5a strain is 

30 responsible for complete restoration of growth at 39°C 

(Table 9) and partial reversion in attenuation in 
animals. This association with the ts and attenuation 
phenotypes was also supported by partial sequence 
analyses of six additional "full TS revertants" 

35 (designated 4a, 3b, pp2, 3A, 5a, 5A) isolated from cell 
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10 



15 



20 



25 



culture and from chimps, in which only the nucleotide 
9853 mutation reverted (Tables 10-12) (note that one 
AGM (African Green Monkey) isolate which reverted at 
9853 only partially reverted in ts phenotype) . This 
amino acid 451 mutation (Lys -> Arg) is amenable to 
stabilization in cDNA infectious clone constructs, by 
inserting a second mutation to stabilize the codon, 
thereby lessening the likelihood that it will revert 
back to Lys. 

(ii) In 2B20L, a mutation at base 14,649 (A — > G) 

leading to a coding change in the L protein (amino acid 
position 2,050, Asn -►Asp) appears to be associated 
with the ts and attenuation pheno types. This aspartic 
acid at the amino acid 2050 invariably reverts back 
(Asp ->Asn) in TS( + ) revertants or changes to a 
different amino acid (Asp Val) by nucleotide 
substitution at position 14,650 (A — > T) (Tables 8, 
11) . The above observation is based on complete 
sequence analysis on the TS( + ) revertant Rl and partial 
sequence of several additional TS(+) revertants (R2, 
R4A, R7A, R8A) at selected regions (Table 11) . An 
additional mutation is seen in the Rl revertant at 
nucleotide postion 13,347 (amino acid 1616, Asn -> 
Asp) associated with the above reversion. However, the 
effect of this mutation on the ts phenotype is not 
known; the L gene of other revertants has not been 
sequenced completely. 

c. Three base changes are common to 2B33F and 

2B20L strains of virus: 

(i) A change at position 14,587 (C T) with a 

corresponding change (Thr -> lie) at amino acid 2029 is 
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present in both 2B33P and 2B20L (Tables 7,8). This 
nucleotide "T n substitution was found to be present in 
10% of the population of the progenitor RSV2B strain 
and may have been preferred during the attenuation 
5 process. No wildtype base U C" was found in the 2B33F 

and 2B20L virus. 

(ii) Two mutations are seen in the 2B33F and 2B20L 

3* genomic promoter region: nucleotide 4 (C — » G) and 

10 the insertion of an extra A in the stretch of A's at 

positions 6-11 (in antigenomic, message sense) . When 
the sequences of selected TS(+) revertants were 
analyzed, these mutations were seen to have been 
retained in the 2B33P TS( + )5a (Table 7) and the 2B20L 

15 TS(+)R1 (Table 8) revertants. These non-coding, cis- 

acting mutations remained associated with partial viral 
attenuation. 

Expression using the minireplicon RSV-CAT 
system for the analysis of these cis-acting changes has 

20 shown the 3 1 genomic promoter nucleotide 4 (C -» G) 

change to be an upregulation of 

transcription/replication in this in vitro system when 
the 2B progenitor virus or either of the 2B33F or 2B33F 
TS(+) provided helper L gene functions (the N, P and M2 

25 genes are identical in these viruses) . 

Complementation analysis of the 2B33F 3' 
genomic promoter and the helper functions provided by 
the progenitor RSV2B virus or the 2B33F and 2B33F TS(+) 
viruses by this RSV-CAT minireplicon system has also 

30 been conducted. All three viruses supported both the 

2B and 2B33F 3 1 genomic promoter mediated 
transcription/replication functions. However, the 
2B33F and 2B33F TS(+) viruses preferred their 2B33F 3' 
genomic promoters. This analysis clearly shows co- 

35 evolution of 3 1 genomic promoter changes during the 
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vaccine attenuation process, along with the RNA 
dependent RNA polymerase gene. Reversion of ts 
phenotype in the 2B33F mutant 5a by reversion of the 
single L protein amino acid 451 (Arg -*Lys) by 
sequence analysis was clearly demonstrated by support 
of transcription/replication functions of RSV-CAT 
minireplicon at 37°C. The 2B33P virus did not provide 
helper functions to the RSV-CAT minireplicon (with 2B 
or 2B33F 3' genomic promoters) at 37°C. 

d - A biased hypermutation of SH seen in 2B33F is 

present in all 2B33F revertants, regardless of 
phenotype, and is not seen in 2B20L, which is ts, ca, 
and attenuated. Thus, there are no data at this time 
that associate this mutation with any biological 
phenotype . 

Another wild- type RSV designated 18537 was 
also sequenced and compared to the sequence of the 
wild- type RSV 2B strain, with one exception, at all 
the critical residues described above, the two wild- 
type strains were identical. For 2B, the codon ACA at 
nucleotides 14586-14588 encodes a Thr at amino acid 
2029 of the L protein, while for 18537, the codon ATT 
at nucleotides 14593-14595 encodes an lie at amino acid 
2029 (the L gene start codon is at nucleotides 8509- 
8511 in 18537, compared to 8502-8504 in 2B) . 

Example 4 
PCR Assay to Detect Measles Vinia 

A 21 year old patient was admitted to a 
hospital with a three week history of progressive non- 
productive cough, shortness of breath, and fever. His 
symptoms failed to improve following treatment with 
clarithromycin for seven days or after a similar course 
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of treatment with atovaquone . Concomitant complaints 
of right upper quadrant abdominal pain proved 
recalciltrant to omeprazole and antacids. Relevant 
past medical history included Factor VIII deficiency 
5 and HIV infection diagnosed 3-4 years prior to this 

hospital admission. One year earlier, he had received 
a booster immunization of measles-mumps-rubella (MMR) 
vaccine as required for college enrollment. 

Bronchoalveolar lavage and transbronchial 

10 biopsies performed two days after admission to the 

hospital demonstrated reactive hyperplasia and alveolar 
lining cell desquamation with minimal chronic 
infl amma tion. No microorganisms were revealed by Gram, 
methenamine silver, or PAS stains. CT scans of the 

15 chest showed multiple, ill-defined, confluent nodules 

at the left lung base. Despite administration of 
empiric antimicrobials for opportunistic bacterial, 
mycobacterial, and fungal pathogens commonly 
responsible for pulmonary complications of advanced HIV 

20 disease, the patient became and remained febrile to 

39°C. A left-sided pleural effusion developed; 
diagnostic thoracentesis showed it to be exudative but 
otherwise non-diagnostic. Bronchoalveolar lavage 
performed three weeks later only demonstrated alveolar 

25 histiocytes, some of which were hemosiderin laden, a 

few lymphocytes, and neutrophils. FITS, AFB, and 
me thanamine silver stains again were negative. 

Two weeks thereafter, a wedge resection of 
the left lung was performed through CT-guided 

30 mini thoracotomy . Multiple tissue sections revealed 

nodular areas of acute and chronic inflammation with 
regions of necrosis and fibrosis. Numerous 
multinuclated giant cells were present, some of which 
contained both intracytoplasmic and intranuclear 

35 inclusions suggestive of measles virus giant cell 
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pneumonia • Special stains for bacteria, fungi, P. 
carinii, and acid fast organisms again gave negative 
results. Electron microscopic examination of sections 
of this lung biopsy revealed particles morphologically 
consistent with paramyxoviruses such as measles virus. 
Serum anti-measles IgM titers determined by a solid 
phase hemadsorbant assay were negative, as was a 
subsequent IgM capture immunoassay. 

Two weeks later, Rhesus monkey kidney (RMK) 
tissue culture cells inoculated with the patient's lung 
biopsy material revealed cytopathic changes 
characteristic of measles virus infection. 
Confirmation was obtained using an immunofluorescence 
assay with monoclonal antibodies directed to measles 
virus. Based upon this diagnosis, oral ribavirin 
lOOOmg B.I.D. was given for 14 days. Unfortunately, 
the patient progressively deteriorated, eventually 
dying two months later. 

In order to ascertain the nature of the 
measles virus present in the patient, reverse 
transcription and PCR amplification of virus obtained 
from infected tissues were performed, followed by 
sequence analysis. The measles virus isolated from 
Rhesus monkey kidney cells inoculated with tissue from 
this patient's lung biopsy was propagated by two serial 
passages in the continuous Vero (monkey kidney) tissue 
culture cell line. Total infected cell RNA was 
extracted at the second Vero cell passage using TRIzol 
reagent (Life Technologies, Grand Island, NY) according 
to the manufacturer's protocol. Total RNA was 
similarly extracted from the patient's lung biopsy 
material. The measles virus vaccine strain (Moraten) 
currently used in the United States as a component of 
the trivalent MMR vaccines, was obtained in its 
univalent form (Attenuvax™, Merck, Sharpe, & Dohme) . 
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This virus was passaged once in Vero cells and total 
vaccine infected cellular RNA then was extracted as 
described above. 

Each of these RNA preparations was reverse 

5 transcribed (RT) to cDNA using random hexameric primers 

and Maloney murine leukemia virus reverse transcriptase 
(Perkin-Elmer/Cetus RT-PCR kit reagents, Perkin-Elmer- 
Cetus, Branchburg, NJ) . The cDNA then was amplified by 
PGR using measles virus-specific oligodeoxynucleotide 

10 primer pairs whose design was based on the Edmonston 

measles virus sequence described above. These PCR 
products comprised a set of overlapping DNA fragments 
spanning the entire 15,894 nucleotide long measles 
genome. A consensus genomic sequence was established 

15 by direct analysis of each PCR product, without 

cloning, using the dideoxy terminator cycle- sequencing 
method established by the manufacturer (ABI PRISM 377 
sequencer and ABI PRISM DNA sequencing kit; Perkin- 
Elmer/Cetus, Poster City, CA) . Both strands of the 

20 PCR-amplified DNA products were analyzed to eliminate 

possible sequencing ambiguities. 

The nucleotide sequences of selected regions 
of the measles virus genomes present in the patient's 
viral isolate, as well as in the diseased lung tissue, 

25 were compared with that of the Moraten vaccine virus, 

as well as with the nucleotide sequences of other 
measles virus wild- type and vaccine strains. This 
sequence analysis revealed identity to the Moraten 
vaccine strain rather than demonstrating relatedness to 

30 past or currently circulating wild- type viruses or 

other measles vaccine strains. 
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Example 5 
EL ISA to Detect RSV 

An EL ISA teat is used to detect the presence 
5 of RSV. Peptides are designed and selected based on 

homologies to the RSV sequences described herein to be 
specific for all subgroup B strains, or for individual 
wild- type, vaccine or revertant RSV subgroup B strains 
described herein. These peptides are then coupled to 
10 KLH and used to immunize rabbits for the production of 

monospecific polyclonal antibody. A selection of these 
polyclonal antibodies, or a combination of polyclonal 
and monoclonal antibodies is then used in a "capture 
ELISA" to detect the presence of an RSV antigen. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Udem, Stephen A. 

Sidhu, Mohinderjit S. 
Tat em, Joanne M. 
Murphy, Brian R. 
Randolph, Valerie B. 

(ii) TITLE OF INVENTION: 3* Genomic Promoter Region and 

Polymerase Gene Mutations Responsible for Attenuation in 
Viruses of the Order Designated Mononegavirales 

(iii) NUMBER OF SEQUENCES: 79 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: American Home Products Corporation 

(B) STREET : One Campus Drive 

(C) CITY: Paraippany 

(D) STATE: New Jersey 

(E) COUNTRY: United States 

(F) ZIP: 07054 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC -DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Gordon, Alan M. 

(B) REGISTRATION NUMBER: 30,637 

(C) REFERENCE /DOCKET NUMBER: 33,294 PCT 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 973/683-2157 

(B) TELEFAX: 973/683-4117 



(2) INFORMATION FOR SBQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
ACCAAACAAA 6TTGGGTAA0 GATAGATCAA TCAAT6ATCA TATTCTAGTG CACTTAGGAT 
TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT AT CCGAGATG GCCACACTTT 
TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCO 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACC ACT CG ATCCAGACTT CTGGAC CGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACT ATT AAG T TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 



SUBSTITUTE SHEET (RULE 26) 



WO98/13S01 PCT/US97/16718 



88 



AGATCAGTAQ 


AGCGGTTGGA CCCAGACAAG 


CCCAAGTATC 


ATTTCTACAC 


GGTGATCAAA 


1380 


OTGAGAATGA 


GCTACCGAGA TTGGGGGGCA AGGAAGATAG 


GAGGGTCAAA 


CAGAGTCGAG 


1440 


GAGAAGCCAG 


GGAGAGCTAC AGAGAAACCG 


GGCCCAGCAG AGCAAGTGAT 


GCGAGAGCTG 


1500 


CCCATCTTCC 


AACCGGCACA CCCCTAGACA 


TTGACACTGC 


ATCGGAGTCC 


AGCCAAGATC 


1560 


CGCAGGACAG 


TCGAAGGTCA GCTGACGCCC 


TGCTTAGGCT 


GCAAGCCATG 


GCAGGAATCT 


1620 


CGGAAGAACA 


AGGCTCAGAC ACGGACACCC 


CTATAGTGTA 


CAATGACAGA 


AATCTTCTAG 


1680 


ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 


1740 


AAAACTTAGG 


AACCAGGTCC ACACAGCCGC 


CAGCCCATCA 


ACCATCCACT 


CCCACGATTG 


1800 


GAGCCGATGG 


CAGAAGAGCA GGCACGCCAT 


GTCAAAAACG 


GACTGGAATG 


CATCCGGGCT 


1860 


CTCAAGGCCG 


AGCCCATCGG CTCACTGGCC 


ATCGAGGAAG 


CTAT GGCAGC 


ATGGTCAGAA 


1920 


ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 


1980 


GGTCTCAGCA AACCATGCCT CTCAGCAATT 


GGATCAACTG 


AAGGCGGTGC 


ACCTCGCATC 


2040 


CGCGGCCAGG 


GACCTGGAGA GAGCGATGAC 


GACGCTGAAA 


CTTTGGGAAT 


CCCCCCAAGA 


2100 


AATCTC CAGG 


CATCAAGCAC TGGGTTACAG 


TGTTATTATG 


TTTATGATCA 


CAG CGGTGAA 


2160 


GCGGTTAAGG 


GAATCCAAGA TGCTGACTCT 


AT CATGGTTC 


AATCAGGCCT 


TGATGGTGAT 


2220 


AGCACCCTCT 


CAGGAGGAGA CAATGAATCT 


GAAAACAGCG 


ATGTGGATAT 


TGGCGAACCT 


2280 


GATACCGAGG 


GATATGCTAT CACTGACCGG 


GGATCTGCTC 


CCATCTCTAT 


GGGGTTCAGG 


2340 


GCTTCTGATG 


TTGAAACTGC AGAAGGAGGG 


GAGATCCACG 


AGCTCCTGAG 


ACTCCAATCC 


2400 


AGAGGCAACA 


ACTTTCCGAA GCTTGGGAAA ACTCTCAATG 


TTCCTCCGCC 


CCCGGACCCC 


2460 


GGTAGGGCCA 


GCACTTCCGA GACACCCATT 


AAAAAGGGCA 


CAGACGCGAG 


ATTAGCCTCA 


2520 


TTTGGAACGG 


AGATCGCGTC TTTATTGACA 


GGTGGTGCAA 


CCCAATGTGC 


TCGAAAGTCA 


2580 


CCCTCGGAAC 


CATCAGGGCC AGGTGCACCT 


GCGGGGAATG 


TCCCCGAGTG 


TGTGAGCAAT 


2640 


GCCGCACTGA 


TACAGGAGTG GACACCCGAA 


TCTGGTACCA 


CAATCTCCCC 


GAGATCCCAG 


2700 


AATAATGAAG AAGGGGGAGA CTATTATGAT 


GATGAGCTGT 


TCTCTGATGT 


CCAAGATATT 


2760 


AAAACAGCCT 


TGGCCAAAAT ACACGAGGAT 


AATCAGAAGA 


TAATCTCCAA 


GCTAGAATCA 


2820 


CTGCTGTTAT 


TGAAGGGAGA AGTTGAGTCA 


ATTAAGAAGC 


AGATCAACAG 


G CAAAAT AT C 


2880 
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A6CATATCCA CCCTOGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA CTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TOACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGOCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGGC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTG QTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AGATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 



2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3460 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 
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TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGTTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTTT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAGACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
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ATCAATAATG AGCTGATACC GTCTATQAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTT CATTTT A 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATOCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTCTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATC GOG CAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
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AATCATCGGT GATGAAGTGG OCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTOAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC AACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 



7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
6100 
8160 
8220 
8280 
8340 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 93 - 



ATAGGGCTGC TAGTOAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GG AAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAOTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
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A6TTCCTGC0 TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA AT CTT C ATT A AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAAT CAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
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TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GT CAT GAGA C ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
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AAGGAGTGTT TAAGGTOCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC AT CTCAAAT A TGAGCATCAA GGATTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTOAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
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AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15400 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

TTTGGGGGCA CAT TCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 15600 

ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15780 

ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15640 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 
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lie Glu Asp Lys Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lys Lya 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

lie Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val lie Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu lie Ser Arg Asp 
195 200 205 

Leu Val Ala lie lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val lie Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr lie Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
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Leu Met Lys Oly His Ala He Phe Cya Oly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Oly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Mat Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Aap His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
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Ser Ala Phe lie Thr Thr Asp Leu Lye Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 6B5 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His lie 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin lie Phe lie Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lye Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin lie Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Aen Asn Asp Leu Leu He 
915 920 925 
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Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Abu 
330 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 

950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 

965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
102 5 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 



Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
H° 5 H10 1115 H2< 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 H30 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
H55 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
H70 1175 H80 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
I 185 H90 H95 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 



1045 



1050 



1055 



Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 
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1205 



1210 



1215 



Thr Asp Qlu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg lie Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 128C 

Ser Thr Ser Thr Asn Leu Ala Hie Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Aen Thr Val 
1345 1350 1355 136C 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1441 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 



1475 



1480 



1485 
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Gin Cye Ala Ala lie Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Oly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lye Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys lie Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 " 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 isio 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 "50 1755 " 1760 
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Sor Ser Ala Cys Tyr Lya Ala VaX Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu lie Thr Tyr Lys Glu lie Leu Lys Leu Asn Lye Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn lie Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn lie Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 I860 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
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2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lya Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNBSS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTC 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCGGGA GATTCCTCAA 240 
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TTACCACTCG ATCTAOACTT CTQOACCOOT TGGTCAQGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAQ OTATATTATC CTTATTTOTG OAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATATTTT TCACATGATG ATCCAAGTAG TAGTGATCAA TCCAGGTTCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCTCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACQGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 

AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 840 

GGAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC X020 

AAATGGGGGA AACTGCACCA TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 1080 

GTGCAGGATC ATACCCTCTO CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 1140 

ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTCGA TCCAGCATAT TTCAGACTAG 1200 

GGCAAGAGAT GGTGAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 1260 

GTATCACTGC CGAAGATGCA AGGCTTGTTT CAGAGATCGC AATGCATACT ACAGAGGACA 1320 

GGATCAGTAG AGCGGTTGGA CCCAGACAAT CCCAAGTGTC ATT CCTACAC GGTGATCAAA 1380 

ATGAAAATGA GCTACCGAGA TGGGGGGOTA AGGAAGATAT GAGGGTCAAA CAGAGTCGGG 1440 

GAGAAGCCAG AGAGAGCTAC AGAGAAACCA GGCCCAGCAG AGCAAGTGAC GCGAGAGCTA 1500 

CCCATCCTCC AACCGACACA CCCTTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 1560 

CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 1620 

CGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGAGTGTA CAATGACAGA GATCTTCTAG 1680 

ACTAGGTGCA AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 1740 

AAAACTTAGG AACCAOGTCC ACACAGCCGC CAGCCCACCA ACCATCCACT CCCACGATTG 1800 
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GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT I860 

CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCOAGGAAG CTATGGCAGC ATGGTCAGAA 1920 

ATATCAGACA ACCCAGGACA GGAGCGAGCC GCCTGCAAGG AAGAGAAGGC AAGCAGTCCG 1980 

GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 2040 

CGCGGTCAGG GATCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAGGA 2100 

AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 2160 

GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 2220 

AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 2280 

GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 2340 

GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 2400 

AGAGGCAACA ACTTTCCAAA GCTTAGGAAA ACTCTCAATG TTCCCCCGCC CCCGGACCCT 2460 

GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 2520 

TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 2580 

CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 2640 

GCCGTACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 2700 

AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 2760 

AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCACCAA GCTAGAATCA 2820 

CTGCTGTTAT TGAAGGGGGA AGTTGAGTCA ATCAAGAAGC AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCTTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 3000 

GGCAGAGATT CAGGCCGAGC ACTGGCTGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 3060 

ATCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 3120 

CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCGGA CACCGGCCCT 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGA TGACTCTCCT TGATGACATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 3300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 3360 
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CCAGTCOACC TAGCTAATAC AACCTAAATC CATTATAAAA AACTTAGGAO CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GAAAAGATGA ATGTTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCTCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CTCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCATACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCAGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GOAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATTGGC CATGGGAAGA TCATCGACAA TGCAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA AAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCCTATGT TACCCACTGA TGGATATCAA TOAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTTATCA TAAATGATGA CCAAGGATTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA GACGACCCTC CTCACAATGA CAGCCAGAAG 
GCCCGGAAAA AAAGGCCCCC TCCGAAAGAC TCCAGAGACC AAATGAGAGG CCAGCCAGCA 
GCTGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CATAAGGCCA 
CCACCAGCCA TCCCAATCTG CATCCTCCTC GTAGGACCCC CGAGGACCAA CCCCCAAGGT 
TGCCCCCCAC CCAAACCACC AACCGCATCC CTACCACCCC CGGGAAAGAA ACCCCCAGCA 
ACTGGAAGAG CCCTTCCCCT TTCCCTCAAC ACAAGAACTC CACAACCGAA CCACACAAGC 
GACCGAGGTG ACCCAACCGC AGGCACCCGA CTCCCTAGAC AGATCCTCTC CCCCTGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 



3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
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CGGCGCCQCO CCCCCAACCC CCGACAACCA GA6OGA6CCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG CACACCAACC CCCGAACAGA CCCAGCACCC AGCCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGACAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCTGCQC 5220 

ACCCCAGCCC CGATCCGGCG GGCAGCCACC CAACCCTAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCTTCCTCT TCTCGAAGGG ACTAAAAGAT CAATCCACCA CATCCGACGA CACTCAACTC 5400 

CCCGTCCCTA AAGGAGACAC CGGGAATCCC GGAATTAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGTTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCAG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGG GCAAGTCTGG AAACTACTAA TCAGGCAATT 5940 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCTAGCTTA 6120 

CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT CGGAGGAGAT 6180 

ATCAATAAGG TGTTAGAAAA GCTCGGATAT AGTGGAGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 6300 

AGTATAGCCT ACCCGACGCT GTCCGAGATC AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 6360 

GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACGACTG TGCCCAAGTA TGTTGCAACC 6420 

CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 6480 
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GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCTGGGTCTT TTGGGAACCG GTTCATTTTG 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TOACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GOAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GAT GCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTGTCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AAGAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 
ACATCGAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGGAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TCCCCTTTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TTATCAACAG AGAACACCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CAATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TTGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAACCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAGGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
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AOOTOTTATC AGAAATCCGO GTTTGGCGGC TCCGGTGTTC CATATQACAA ACTATTTTGA 8100 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 8160 

AGCCCTTTGT CACGGGGGAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 8220 

CAGCTTTCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCTTCTCA ACGGATGACC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTAT CCCGACAACA AGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TOCTTCCAGC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 8460 

CQAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGAGTCTTGT CTGTTGATCT 8520 

GAGTCTAACA OTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 8580 

CGGTTCAGGG ATGGACCTAT ACAAGTCCAA CCACAACAAT GAGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCC TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC AACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGO TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATCCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TOTOOTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGATCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GOTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATAGCAG 9120 

ATAGGGCTGC CAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTOAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCCG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ACAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAC TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GAAATTCOCT GTACTCTAAA GTCAGTAATA 9600 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 112 - 



AGGTTTTCCA ATQCTTGAOG GACACTAATT CACGGCTTGG TCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TAACATTTGA GCTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CTGCTATGAC CATTGATGCT AGATATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AATTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACGGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCACGAG TTAGTTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACACCTGAC AGGGOAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC CTCAGGTGAA GGATTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTCATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCC CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATA TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCTAAAGAT CTCAAAGAAA 
GTCACAGAGG GGGGCCAGTC CTAAAAACCT ACTCCCGAAG CCCAGCCCAC ACAAATACCA 
GGAACGTGAG GGCAGCAAAA GGGTTTATAG GGTTCCCTCA GATAATTCGG CAGGACCAAG 
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ACACTAATCA TCCGGA6AAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACAACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
; TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCATTA OTGCAAGGGG ACAATCAGAC CATAGCTGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AATGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTTTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGAC CGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAG ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCAGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCATCACTGA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAGCAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 
TCCACAGTCC AAACCCAATG TTAAAGGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC AGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTAACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCC CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
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TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAOAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGOAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TOAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CAAGGCAAAG GGCTAATOTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCACATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTC CT A GGGTTGGGCG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CAATGATAGA TCATCCCAGG ATACCCAGCT 
CTCGCAAGCT AGAGCTGAGG GCAGAGCTGT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG C CAT AGGAGG CACCTTGTAG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCACTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTTACA TTTCTTTTGT GTGAAAGTGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC GGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTTCGGCG AGGATCGATC AAACAGATAA 
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GATTGAGAGT TGATCCAGOA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 14340 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCCCCACACG 14400 
ATGATGTTGC AAAATTGCTC AAAGATATCA ATACAAGCAA GCACAATCTT CCCATTTCTG 14460 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 14520 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 14580 
ACGGCTTATT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 14640 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TCTCTGCCAA TTCTAGATCT GGTCAAAGGG 14700 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 14760 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACATG GGTAGGCAGT GTAGATTGCT 14820 
TCAATTACAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 14880 
AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCOA 14940 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 15000 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 15060 

TATACCCCAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 15120 

AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 15180 

GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 15240 

CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 15300 

CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 15360 

AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAOA TTCAAGGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGAATC ACTCGCAAAT 15540 

TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 15600 

ATCTCAAGTC CGGTTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAG GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15780 

ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCCAGG TGGTTAGGCA TTATTTGTAA 15840 
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TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAOCTTTGTC TGGT 



15894 



(2) INFORMATION FOR SSQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS i 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDED NESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 4: 

Met Asp Ser Leu Ser Val Aon Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro He Val Thr Asn Lye He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro Hia Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lye His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
g 5 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asn Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 



180 



185 



190 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT7US97/16718 



- 117 - 



Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
3 °5 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Val Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
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Lye Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Oly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp 8er Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu lie Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Ala His Thr Asn Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin He He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asn His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 



Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr He 



645 



650 



655 
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740 



745 



750 



Pro Tyr Leu Tyr Leu Ala Ala Tyr Qlu Ser Oly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Oly Asp Asn Qln Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Trp Olu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 ^ 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
"5 840 845 

Ser He Ala Arg Cya Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 655 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Gin Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
315 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 
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Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1° 7 5 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1°*0 1095 iioo 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
H° 5 IHO 1115 iiao 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
"25 1130 ii3 5 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
H40 H45 H50 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
H55 H60 H65 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
H 7 0 1175 H80 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 "SO H95 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
"50 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala Hie Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Oly Thr Sor Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Aan Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 - 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
«45 1350 1355 136( 

Leu His Leu His Val Glu Thr Asp Cys Cys Val lie Pro Met lie Asp 
1365 1370 1375 

His Pro Arg lie Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu lie Tyr Asp Asn Ala Pro Leu lie Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His lie Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met lie Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu lie Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
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1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1655 
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Val Gly Ser Val Asp Cya Phe Asn Tyr lie Val Ser Asn lie Pro Thr 
1860 186S 1870 

Ser Ser Val Gly Phe He His Ser Asp lie Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr lie Glu Lys Leu Glu Glu Leu Ala Ala lie Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys lie Gly Ser lie Leu Val lie Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe lie Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe lie Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys lie Lys Gin Gin lie He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser lie Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 
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Leu He Leu Asp Leu Hie Oln Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 * 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNBSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTGTTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAAGTAG TAGTGATCAA TCCAGGTCCG 540 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATTCTA GCCCAAATTT GGGTCTTGCT CGCGAAGGCG GTTACGGCCC 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTCAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 



600 
660 
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AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGGACACCCG 
GGAACAAACC AAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTA ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 
AAATGGGAGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCCCTG CTCTGGAGCT ATGCCATGGG AGTAGGGGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTGAGGAGG TCAGCTGGGA AAGTCAGTTC CACATTAGCA TCTGAACTCG 
GTATCACTGC TGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCACACT ACTGAGGACA 
GGACCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTGTC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCAGGA TTGGGGGGCA AGOAAGATAG GAGGGTCAAA CAGAGTCGGG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGTCTAGCAG AGCAAGCGAT GCGAGAGCTG 
CCCATCTTCC AACCAGCGCA CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 
CGCAGGACAG TCGACGGTCA GCTGACGCCC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 
TGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTGTA CAATGACAGA GATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCTACGACTG 
GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCACAA 
ATATCAGACA ACCCAGGACA GGACCGAACC ACCCGCAAGG AAGAGGAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCAGTGC ACCTCGCATC 
TGCGGTCAGG GATCTGGAGA GAGCGATGAC AACGCTGAAA CTTTGGGAAT CCCCTCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATCATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
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GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCT 
AGAGGCAACA ACTTCCCGAA GCTTGGOAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 
GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGGA CAGACGCGAG ATTAGCCTCA 
TTTGGAGCGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GTGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAAAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTOGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACTCAATC CCGACCTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAA 
CTAAAGCCGA TCGGGAAAAA GATOAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCC 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCTCATG 
CCAATCGACC TAATTAGTAC AGCCTAAATC CATTATAAAA AACTTAGGAG CAAAGTOATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTACG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGT AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTCGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 
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TCCOTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGOOTATTAC ACCGTTCCTA 3960 

GAAGAATGCT AGAATTCAGA TCGGTCAATG CAGTGGCTTT CAACCTGCTG GTGACCCTTA 4020 

GGATTGACAA AGCGATTGGC CCTGGGAAGA TCATCGATAA TGCAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCTG 4140 

AT T ATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAAAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCCC 4380 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATACCCGAA AACGACCCCC CTCATAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCAAAAGAC TCCACGGACC AAGTGAGAGG CCAGCCAGCA 4560 

GCTGACGGCA AGCGTGAACA CCAGGCGGCC TGGGCACAGA AGAGCCCCGA CACAAGGCAA 4620 

CCACCAGCCA TCCCAATCTG CGTCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGT 4680 

CGCCCCCGAC CCAGACCACC AACCGCATCC CCACAGCCCC CGGGAAAGAG ACCCCCAGCA 4740 

ACTGGAAGGC CCCTCCCCCT TTCCCTCAAC GCAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GATCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC CCCCCGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCGAC AGAACCCAGA CCCCGGCCCA 4920 

CGGCGCCGCG CCCCCACCTC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG CACACCAACC CTCGAACAGA CCCAGCACCC AGCCATCGAC 5040 

AATTCAAGAC GGGGGGCCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CAGGAACCGA ACCAGAATCC AGACCACCCT 5160 

GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCTGCCC TGATCCGGTG GGCGGCCACC GAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGGCCC CCGAACCGCA AAAGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCCCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAATTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 
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GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGGAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAOA 5700 

ACAGTTTTGG AACCAATTAG AGATOCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCTG GAGTTGTCCT GGCGGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

TTQAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 5940 

GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTAGGGCTCA AATTGCTCAO ATACTATACA GAAATCCTGT CACTATTTOG CCCCAGCTTA 6120 

CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAT 6180 

ATCAATAAGG TGTTAQAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTACTC 6300 

AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAAGGG 6360 

GTCTCGTACA ACATAGGCIC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 6420 

CAAGGGTACC TTATCTCGAA TTTTGATOAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 6480 

GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGOGGG 6540 

TCCACCAAGT CCTGTGCTCG TACACTTGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 6600 

TCACAAGGGA ATCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 6660 

ACGATCATTA ATCAGGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 6720 

GTGGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGCGOTATCC GGACGCTGTG 6780 

TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 6840 

AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 6900 

CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTTTACAT CCTGATTGCA 6960 

GTGTGTCTTG GAGGOTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG 6GGGCGTTGT 7020 
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AACAAAAAGQ GAGAACAAGT TGGTATQTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCGAGCC CACCTGAAAT 
TGTCTCCGGA TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAGGGGA AGTAGGATAG TTATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATAAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
GATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAOAGA TTCACCGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAACTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGG CCAGGGTAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AATCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGTAAAGGGT CAGAGTTGTC ACAACTGAGC ATGCACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 
GCAACCAGTC AGTAATGATT TCAGCAACTG CATGGTGGCT TTGGGGGAGC TCAAATTCGC 
AGCCCTTTGT CACAGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCCTATCA ACGGATGATC CAGTGATAGA CAGGCTCTAC CTCTCATCTC ACAGAGGCGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGGACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTAATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCAGGA TTCGGGCCAT TGATCACACA 
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CGGTTCAGGG ATOGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATOAAO AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTTCCAAT TAAGGAAGCA GGCGAGGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTATGAT ACTTCCAGAG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAGG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGATATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAACCGCAG 9120 

ATAGGGCTGC CAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCATACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGACCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CAAGGAAGAT CCGTGAGCTC CTCAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CAGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACGTTTGA ACTGGTCTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC CATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTACCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 
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CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CCCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATAT7CT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAAGGAA CTGGGTCACG GAGGCTTGTA AATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGACA TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AATTATTTTA AGGACAATGG GATGGCCAAG GACGAGCACG 
ATT TGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCC ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
AGAACGTGAG AGCAGCAAAA GGGTTTATAG GATTCCCTCA TGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAGGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTATTT GCACAAAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAAACCT 
CTGTCCTCTA TGTAAGTGAC CCTCATTGCC CCCCTGACCT TGACGCCCAT GTCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGOAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATAGGCCA TCACCTCAAG GCAAATGAGA 
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CAATTOTCTC 


ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 


11760 


TGTCCCAATC 


ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 


11820 


AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 


11880 


ATGACCGTTA 


CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATCCTGATCT 


11940 


CTCTTGGCTT 


CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 


12000 


ACAACGATCT 


CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATCGGGGGG ATGAATTATC 


12060 


TGAATATGAG 


CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 


12120 


ATCTCAAGAG 


AATGATTCTC TCATCACTAA TGCCTGAAGA GACCCTTCAT CAAGTAATGA 


12180 


CACAACAACC 


GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 


12240 


TTGTATGCGT 


CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 


12300 


TCCATAGTCC 


AAACCCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 


12360 


AGGGACTGGC 


GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 


12420 


TCCTGGATCA 


TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 


12480 


AAGGCCTGAT 


TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 


12540 


TGTCCAATTA 


TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 


12600 


GAAATGTCCT 


CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAOAGCC CTAAGAAGCC 


12660 


ATATGTGGGC 


AAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 


12720 


TAGAATCTAT 


GCGAGGCCAC CTTATTCGGC GCCATGAGAC ATGTGTCATC TGCGAGTOTG 


12780 


GATCAGTCAA 


CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 


12840 


AGO AAA CAT C 


ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 


12900 


TGAAGCTTGC 


CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 


12960 


CAGTGTACTC 


ATGGGCTTAT GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 


13020 


CAAGGCAAAG 


GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 


13080 


CGACTAATTT 


AGCGCATAGG TTGAGGGATC GTACCACTCA AGTGAAATAC TCAGGTACAT 


13140 


CCCTTGTCCG 


AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 


13200 


CAGATAAGAA 


GGTTGATACT AACTTTATAT ACCAACAGGG AATGCTTCTA GGGTTGGGTG 


13260 
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TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCOGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTTAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTACACA 
CAACTGTGTG CAACATGATT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTTCTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA TATCCAGGCA AAACACTTGT GTGTTCTAGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTACGAC CTGTAGAGAA ATGTGCAGTT CTAACCGATC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGGTCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTTG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGTAATCT CGCCAATTAT GAAATCCACG CTTTCCGCAG AATCGGGTTA AACTCATCCG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGTC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 



13320 

13380 

13440 

13500 

13560 

13620 

13680 

13740 

13800 

13860 

13920 

13980 

14040 

14100 

14160 

14220 

14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 

14820 
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TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTA6T6TGGG QTTTATCCAT TCAGATATAG 
AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTAGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTTGGCAAA AT AGGAT CAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGOTCTTA TTATAGAGAA GTGAACCTTG 
TCTACCCTAO ATACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTCATG ACAGATCTCA 
AAGCTAAGCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG GTATCAACCC TATTCTGAAG AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCATG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TTTGGGGGCA TATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATCGG TTTATCCAGA 
ATCTCAAGTC CGGTTACCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CTAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA AC GTGAGTGG GTTTTTAAGG 
TAACAATCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ATTAATTGGT TGGACTCCGG GACCCTAATC CTGCCCTAGG TAGTTAGGCA TTATTTGCAA 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 
(2) INFORMATION FOR SSQ ID NO: 6: 

(i) SBQUSNCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



14B80 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
157B0 
15840 
15894 
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Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Qlu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Thr His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 



Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys He He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 



115 



120 



125 
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275 



280 



285 



Ala Tyr Leu Gin Leu Arg Asp lie Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Aen Hie Cys Phe Thr Glu lie His Asp Val Leu Aap Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu lie Glu Ala Leu Asp Tyr 
325 330 335 

lie Phe lie Thr Asp Asp He His Leu. Thr Gly Glu lie Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val lie Val Tyr Glu Thr 
370 375 380 

Leu Net Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lye Phe 
435 440 445 

Gly Cys Phe Net Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asn Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Net He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 
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Glu Asn Leu lie Ser Asn Gly lie Gly Asn Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr His Ser Arg Ser Pro Val His Thr Ser Thr Lys Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro His Val lie Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
7 °5 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

Hie Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Leu Leu Val Ser Gin Ser Leu Lys 
845 

Ser Qlu Thr lie Val Asp Olu Thr 
860 

Thr Thr Met Ala Lye Ser He Glu 
875 880 

Tyr Ser Leu Aen Val Leu Lys Val 
890 895 



138 



Ser Lye Oly He Tyr Tyr Asp Gly 
835 840 

Ser He Ala Arg Cys Val Phe Trp 
850 855 

Arg Ala Ala Cys Ser Asn He Ala 
865 870 

Arg Gly Tyr Asp Arg Tyr Leu Ala 
885 



He Gin Gin He Leu He Ser Leu Gly Phe 
900 905 



Thr Arg Asp Val Val He Pro Leu 
915 920 

Arg Met Ala Leu Leu Pro Ala Pro 
930 935 

Met Ser Arg Leu Phe Val Arg Asn 
945 950 

He Ala Asp Leu Lys Arg Met He 
965 



Thr He Asn Ser Thr Met 
910 

Leu Thr Asn Asn Asp Leu Leu He 
925 

He Gly Gly Met Asn Tyr Leu Asn 
940 

He Gly Asp Pro Val Thr Ser Ser 
955 960 

Leu Ser Ser Leu Met Pro Glu Glu 
970 975 



Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Aen Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
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1105 1110 1115 1120 

L 

Arg Lys Arg Asn Val Leu He Asp Lys Olu Ser Cya Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Oly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Aap Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu lie Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1160 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg lie Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Thr Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg lie Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 
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Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Qln Leu Tyr Hie He Leu Ala Lys Ser Thr 
1425 1«0 1435 1440 

Ala Leu Ser Met lie Asp Leu Val Thr Lye Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1«0 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 isoo 

Ser Qly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 "10 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

He Tyr Thr Cys Tyr Met Thr Tyr Lou Asp Leu Leu Leu Asn Glu Glu 
15 70 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 * 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1«0 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1«5 1640 1645 

Arg Lou Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1*50 1655 1660 
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Asp His Tyr Ser Cys S er Leu Thr Tyr Leu Arg Arg Gly Ser lie Lye 
1665 1570 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
16 *5 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Glv 
"30 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 "SO 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 i77 5 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1«30 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
IMS 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1*75 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala Ha Leu Ser Met Ala 
I 890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1310 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Tyr 
1925 1930 1935 



Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser 



Asn Phe lie Ser 
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1940 



1945 



1950 



Thr Olu Ser Tyr Leu Val Met Thr Asp Leu Lye Ala Asn Arg Leu Met 
1955 1960 1965 

Aen Pro Olu Lys lie Lye Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lye Gin Leu Ser Cys 
1985 1990 1995 200< 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Gly He Asn Pro 
2005 2010 2015 

He Leu Lys Lye Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
^065 2070 2075 208( 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 216C 

Phe Lys Val Thr He Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 



2180 



(2) INFORMATION FOR SEQ ID NO: 7: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 7 : 

ACCAAACAAA GTT6G6TAA6 GATAGATCAA TCAATOATCA TATTCTAOTA CACTTAOOAT 60 

TCAAGATCCT ATTATCA660 ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TGAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATTCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 420 

TTCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAAGCAG TAGTGATCAA TCCAGGTCCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGATCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATTCTA GCCCAGATCT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 

AGGACCTCTC TTTACGCCGA TTCATGGTGG CTCTAATCCT GGATATCAAG AGGACACCCG 840 

GGAACAAACC TAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCTTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 1020 

AAATGGGAGA AACTGCACCC TACATGGTAA TCCTAGAGAA CTCAATTCAG AACAAGTTCA 1080 

GCGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 1140 

ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 1200 

6GCAAGAGAT GGTGAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCCGAACTCG 1260 
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GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
GGATCAGTAG AGCGGTCGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCAGOA TTGGGGGGCA AGGAAGACAG GAGGGTCAAA CAGAGTCGGG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG AGTCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCCTCC AACCAGCATG CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCTC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 
TGOAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTATA CAATGACAGA GATCTTCTAG 
ATTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCCACGACTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCAGAA 
ATATCAGACA ATCCAGGACA GGACCGAGCC GCCTGCAAGG AAGAGGAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCTT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GATCTGGAGA AAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATCATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAA ACTCCAATCC 
AGAGGCAACA ACTTTCCOAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 
AGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGGA CAGACGCGAG ATTGGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CGTCAGGGCC AGATGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCOAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
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TTGCTGTTAT TGAAGOGAGA AGTTGAGTCA ATTAAGAA6C AGATCAACAG GCAAAATATC 2880 

AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TTGCCATTCC TGGACTTGGG 2940 

AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACTCAATC CCGACCTGAA ACCCATCATA 3000 

GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AGCCCGTTGC CAGCCGACAA 3060 

CTCCAGGGAA TGACTAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAA 3120 

CTAAAGCCGA TCGGGAAAAA GGTGAGCTCA GCCGTCGGGT TTGTCCCTGA CACCGGCCCT 3180 

GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 3240 

CGTTACCTGT TOACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 3300 

CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 3360 

CCAGTCGACC TAATTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 3420 

GCCTCCTAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 3480 

AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 3540 

TCAGAGTCAT AGATCCTGGT CT AG GTGATA GGAAGGATGA ATGCTTTATG TACATGTTTC 3600 

TGCTGGGGGT TGTTGAGGAC AGAGATCCCC TAGGGCCTCC AATCGGGCGA GCATTCGGGT 3660 

CCCTGCCCTT AGGTGTTGGT AGATC CACAG CAAAACCCGA GGAACTCCTC AAAGAGGCCA 3720 

CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 3780 

ACAACACCCC ACTAACCCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 3840 

TCAATGCAAA CCAAGTGTGC AATGCGGTTA ATCTAATACC GCTGGACACC CCGCAGAGGT 3900 

TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCCA 3960 

GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTA GTGACCCTCA 4020 

GGATTGACAA GGCGATTGGC CCTGGGAAGA TCATCGACAA TGCAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAGAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGGTTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 4360 
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AAQAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAA6GACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAOC AATACCCGAA AACGACCCCC CTCATAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCAAAAGAC TTCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCOACAGCA AGTGTGOACA CCAGGCGGCC CAAGCACAGA ACAGCCCCGA CACAAGGCCA 4620 

CCACCAGCCA TCCCAATCCG CGTCCTCCTC GTAGGACCCC CGAGGACCAA CCCCCAAGGT 46 B0 

CGCTCCGGAC ACAGACCACC AGCCGCATCC CCACAGCCCT CGGGAAAGGA ACCCCCAGCA 4740 

ACTGGAAGGC CCCTTCCCCC CTCCCCCAAC GCAAGAACCC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGACCCTCCC TCCCCGGCAT 4860 

ACTAAACAAA ACTTAGGGC C AAGGAACACA CACACCCGAC AGAACCCAGA CCCCGGCCCG 4920 

CGGCACCGCG CCCCCACCCC CCGAAAACCA GAGGGAGCCC CCAACCAATC CCGCCGCCCC 4980 

CCCCGGTGCC CACAGGTAGG CACACCAACC CCCGAACAGA CCCAGCACCC AGCCACCGAC 5040 

AATCCAAGAC GGGGGGCCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCATCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAGCCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGAAAAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGGCC CGATCCGGCG GGAAGCCACC CAACCCGAAC CAGCACCCAA OAGCGATCCC 5280 

TGGGGGACCC CCAAACCGCA AAAGACATCA GTATCCCACC GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CATCCGACGA CACTCAATTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAATGTCTT TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGGAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTGG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAAATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTTGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCAAGCCTGG AAACTACTAA TCAGGCAATT 5940 
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GAGGCAATCA GGCAAGCAGG GCAGOAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTAGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCCATC CAGGCTTTGA GCTATGCGCT TGGGGGAGAT 
ATCAATAAGG TATTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CAT CTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ATATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTCT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTGGTCGAGG TGAACGGTGT GACCATCCAA GTCGGGAGCA GGAGGTATCC GGACGCGGTG 
TACCTGCACA GAATTGACCT CGGTCCTCCC ATATCATTOG AGAAGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGCTGGAG GATGCCAAGG AATTGCTGGA GTCATCGGAC 
CAGATATTGA OGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTTTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GGGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGG 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCCCTACAA CTCTTGAAAC ACAGATTTCC 
CACAAGTCTC CTCTCCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGGC CACCCGAAAT 
TGTCTCCGGC TTCCCTCTGG CCGAACGATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC ACCGAGACCG AATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAAGGGA AGTAGGATAG TTATTAACAG AOAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTC CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAGAGCC TCAGCACCAA 
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TCTAGATGTA 
GATCATCGGT 
CATCTCTGAC 
TTGGTUTATC 
GGCTGCTGAA 
CAATCAGTTC 
ATTCTCAAAC 
ATCTATAGTC 
TAATCTGAGC 
AGGGGTTATC 
GCAACCAGTC 
AGCCCTCTGT 
CAGCTTCCAG 
CCCCCTATCA 
TATCGCTGAC 
AATGGAGACA 
CGAGTGGGCA 
GAGTCTGACA 
CGGTTCAGGG 
GCCAATGAAG 
GGTTAGTCCC 
AACATACCTA 
ACCTGGTCAG 
TGTGGTTTAT 
GCCTATAAAG 
CTGGTGCCGT 



ACTAACTCAA TCGAGCATCA 
GATGAAGTGG GCCTGAGGAC 
AAAATTAAAT TCCTTAATCC 
AACCCGCCAG AGAGAATCAA 
GAACTCATGA ATGCATTGGT 
CTAGCTGTCT CAAAGGGAAA 
ATGTCGCTGT CCCTGTTGGA 
ACCATGACAT CCCAGGGAAT 
AGTAAAGGGT CAGAGTTGTC 
AGAAATCCGG GTTTGGGGGC 
AGTAATGATT TCAGCAACTG 
CACAGGGAAG ATTCTGTCAC 
CTCGTCAAGC TAGGTGTCTG 
ACGGATGATC CAGTGATAGA 
AATCAAGCAA AATGGGCTGT 
TGCTTCCAGC AGGCGTGTAA 
CCATTGAAGG ATAACAGGAT 
GTTGAGCTTA AAATCAAAAT 
ATGGACCTAT ACAAAACCAA 
AACCTAGCCT TAGGTGTAAT 
AACCTCTTCA CTGTTCCAAT 
CCTGCGGAGG TGGATGGTGA 
GATCTCCAAT ATGTTTTGGC 
TATGTTTACA GCCCAGGCCG 
GGGGTCCCAA TCGAATTACA 
CACTTCTGTG TGCTTGCGGA 



GGTCAAGGAC 
ACCTCAGAGA 
GGATAGGGAG 
ATTGGATTAT 
GAACTCAACT 
CTGCTCAGGG 
CTTGTATTTA 
GTACGGGGGA 
ACAACTGAGC 
TCCGGTGTTC 
CATGGTGGCT 
GGTTCCCTAT 
GAAATCCCCA 
TAGGCTTTAC 
CCCGACAACA 
GGGTAAAAAC 
TCCTTCATAC 
TGCTTCAGGA 
CCACAACAAT 
CAACACATTG 
CAAGGAAGCA 
TGTCAAACTC 
AACCTACGAT 
CTCATTTTCT 
AGTGGAATGC 
TTCAGAATCT 



GTGCTGACAC 
TTCACTGACC 
TACGACTTCA 
GATCAATACT 
CTACTGGAGG 
CCCACTACAA 
AGTCGAGGTT 
ACTTACCTAG 
ATGCACCGAG 
CATATGACAA 
TTGGGGGAGC 
CAGGGGTCAG 
ACCGACATGC 
CTCTCATCTC 
CGGACAGATG 
CAAGCACTCT 
GGGGTCTTGT 
TTCGGGCCAT 
GTGTATTGGC 
GAGTGGATAC 
GGCGAGGACT 
AGTTCCAATC 
ACTTCCAGGG 
TACTTTTATC 
TTCACATGGG 
GGTGGACATA 



CACTCTTCAA 

TAGTGAAATT 

) 

GAGATCTCAC 
GTGCAGATOT 
CCAGGGCAAC 
TCAGAGGTCA 
ACAATGTGTC 
TGGGAAAGCC 
TGTTTGAAGT 
ACTATTTTGA 
TCAGOTTCGC 
GGAAAGGTGT 
AATCCTGGGT 
ACAGAGGTGT 
ACAAGTTGCG 
GCGAGAATCC 
CTGTTAATCT 
TGATCACACA 
TGACTATCCC 
CGAGATTCAA 
GCCATGCCCC 
TGGTAATTCT 
TTGAACATGC 
CTTTTAGGTT 
ACCAAAAACT 
TCACTCACTC 
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TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC CAGTGAACCG ATCACATGAT GTCACTCAGA CACCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTTCCC GTCATGGACT 
CGCTATCTGT CAACCAGATC TTGTACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTTGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTCTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATGA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CAAGGAAGAT CCGTGAGCTC CTAAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GOAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CT GAG AT GAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGC 
TGTTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAGGA GTCTCAACAT GTATATTACC 
TGACGTTTGA ACTGGTTTTG ATGTATTGTG ATGT CATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC CATTGATGCT AGGTATGCAG AACTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCTATGC 
TGGAGCCACT TTCACTTGCT TACCTGCAAC TGAGGGACAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTQ 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CCTTAGATTA CA TTTT CATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTCAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAGGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAGATCAT 
TTGCTGGAGT GAGATTTGGC TGTTTTATGC CTCTTAGCCT GGACAGTGAT CTGACAATGT 
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ACCTAAAOOA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGATCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATA TGATAATGTA TGTCGTAAGT GGAGCCTACC 
TCCATGACCC TGAGTTCAAT CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT CGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATC GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAGTATTTTA AGGACAAT6G GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTTAA AGCAGAAAAA GGGTTTGTAG GATTCCCTCA TGTAATTCGG CAGAATCAAG 
ACACTGATCA TCCGGAGAAT ATAGAAACCT ACGAGACAGT CAGCGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTATTT GCACAGAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAAACCT 
CTGTCCTCTA TGTAAGTGAT CCTCATTGCC CCCCCGACCT TGACGCCCAT GTCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATCA AGTACCCTAT GGGAGGTATA OAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTACTTATA CCTGGCTGCT TATGAGAGCG 
GGGTAAGGAT TGCCTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAQAGC ATTGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA TCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTTTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGAGATGT AGTCATACCC CTCCTCACAA 
ACAACGATCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAACATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCATCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
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CACAACAACC GGGGGACTCT TCATTCCTAO ACTGGGCTAO CGACCCTTAC TCAGCAAATC 
TTGTATGCOT CCAOAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 
TCCATAGTCC AAACCCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGAGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAGAGCC CTAAGAAGCC 
ATATGTGGGC AAGACTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTA GATCCTTGCG ATCTGCCGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CAAGGGAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCGACTT 
CGACTAATTT AGCGCATAGG TTGAGGOATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCAG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA AGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACTGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAAXGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACCATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
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GACCATCAOO GAAATATCAG AT6GGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTTGACCTO TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTTTTGT GTGAAAGCGA TGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCGATTCGA GGTCTAAGGC CGGTAGAGAA ATGTGCAGTT CTAACCGATC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGTCG AGGATCTATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTTG ATGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGGTCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGTAGTCT TGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTA AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT ATAGATTGCT 
TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTAGTGTGGG ATTTATCCAT TCAGATATAG 
AGACCTTACC CAACAAAGAT ACTATAGAGA AGTTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTACT CCTTGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGCTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TCTACCCTAG GTACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTTATG ACAGATCTCA 
AAGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATCTATCAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGGCGCAGTT AGTAGAGGTG ATATCAACCC TATTCTGAAA AAACTTACAC 



13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
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CTATAGAGCA GGTGCTGATC AGTTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AATTAATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGTAGGCAAC GAGAACTTGT ATCTAGGATC ACXCGGAAAT 15540 

TTTGGGGGCA TATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATCGG TTTATCCAGA 15600 

ATCTCAAGTC CGGTTATCTA ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGCGCT CTGATTAAGG 15780 

ATTAATTGGT TOAACTCCGG AACCCTAATC CTACCCTAGG TAGTTAGGCA TTATTTGCAA 15840 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 8: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND SDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro lie Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His lie Pro Tyx Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 
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lie Glu Asp Lys Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lye Lye 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys He He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr CyB Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Ala Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 
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Asn Val Arg Lys Tyr Met Aan Gin Pro Lys Val lie Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala lie Phe Cya Gly lie He He Aan Gly Tyr 
385 390 395 400 

Arg Aap Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Arg Ser Phe Ala Gly Val Arg Phe 
435 440 445 

Gly Cya Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
4 *5 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 



Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Aan Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Aen 
610 615 620 

Val Lys Ala Glu Lys Gly Phe Val Gly Phe Pro Hia Val He Arg Gin 
625 630 635 640 

Asn Gin Asp Thr Asp His Pro Glu Asn He Glu Thr Tyr Glu Thr Val 



485 



490 



495 



Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 " 510 
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645 650 655 

Sor Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Olu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Oly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 
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Arg Met Ala Leu Leu Pro Ala Pro lie Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn lie Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 104C 

Glu Asp Glu Arg Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 HOO 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
H05 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro lie Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 H80 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cye Gin Leu Asp Aop 
H85 1190 1195 1200 
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lie Asp Lys Olu Thr Ser Ser Leu Arg Val Pro Tyr lie Gly Ser Thr 
1205 1210 1215 

Thr Asp Olu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Oly Asp Asp Asp Ser Ser Trp Aan Olu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Olu Olu Leu Arg Val He Thr Pro He 
1265 1270 1275 12B0 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Oln 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lye Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg Hia Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 * 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
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1475 1480 1485 

Gin Cys Ala Ala lie Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys lie Tyr Lys Lys Phe Trp His Cys. Gly lie He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro lie Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn Xle Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys Val Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Ser Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 
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Ser Ser Ala Cya Tyr Lye Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cye Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
"80 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Aan Lys Cys Phe 
"95 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu Hie Arg Met Gly Val 
1825 "30 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser He Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
I860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 "10 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Hie 
I* 2 * 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1^40 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1555 i960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1570 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
" B5 1550 1995 2000 

He Gin Ala He Val Gly Gly Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

He Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Ser Cys Gly 
2020 2025 2030 
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Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu Val 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SSQ ID NO: 9; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: RNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AG CATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 
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TTACCACTCG 
GCGGGCCCAA 
GTCAATTGAT 
TCCAGAGTGA 
ATGAGGCGGA 
GATGGTTCGA 
TGATTCTGGG 
CAGACACGGC 
TGGTTGGTGA 
AGGACCTCTC 
GAAACAAACC 
GATTAGCCAG 
GACTGCATGA 
AAATGGGGGA 
GTGCAGGATC 
ACTCCATGGG 
GGCAAGAGAT 
GTATCACTGC 
AGATCAGTAG 
GTGAGAATGA 
GAGAAGCCAG 
CCCATCTTCC 
CGCAGGACAG 
CGGAAGAACA 
ACTAGGTGCG 
AAAACTTAGG 



ATCCAGACTT 
ACTAACAGGG 
TCAGAGGATC 
CCAGTCACAA 
CCAATACTTT 
GAACAAGGAA 
TACCATCCTA 
AGCTGATTCG 
ATTTAGATTG 
CTTACGCCGA 
CAGGATTGCT 
TTTTATCCTG 
ATTTGCTGGT 
AACTGCACCC 
ATACCCTCTG 
AGGTTTGAAC 
GGTAAGGAGG 
CGAGGATGCA 
AGCGGTTGGA 
GCTACCGAGA 
GGAGAGCTAC 
AACCGGCACA 
TCGAAGGTCA 
AGGCTCAGAC 
AGAGGCCGAG 
AACCAGGTCC 



CTGGACCGGT 
GCACTAATAG 
ACCGATGACC 
TCTGGCCTTA 
TCACATGATG 
ATCTCAGATA 
GCCCAAATTT 
GAGCTAAGAA 
GAGAGAAAAT 
TTCATGGTCG 
GAAATGATAT 
ACTATTAAGT 
GAGTTATCCA 
TACATGGTAA 
CTCTGGAGCT 
TTTGGCCGAT 
TCAGCTGGAA 
AGGCTTGTTT 
CCCAGACAAG 
TTGGGGGGCA 
AGAGAAACCG 
CCCCTAGACA 
GCTGACGCCC 
ACGGACACCC 
GGCCAGAACA 
ACACAGCCGC 



TGGTCAGGTT 
GTATATTATC 
CTGACGTTAG 
CCTTCGCATC 
ATCCAATTAG 
TTGAAGTGCA 
GGGTCTTGCT 
GGTGGATAAA 
GGTTCGATGT 
CTCTAATCCT 
GTGACATTGA 
TTGGGATAGA 
CACTTGAGTC 
TCCTGGAGAA 
ATGCCATGGG 
CTTACTTTGA 
AGGTCAGTTC 
CAGAGATTGC 
CCCAAGTATC 
AGGAAGATAG 
GGCCCAGCAG 
TTGACACTGC 
TGCTTAGGCT 
CTATAGTGTA 
ACATCCGCCT 
CAGCCCATCA 



AATTGGAAAC 
CTTATTTGTG 
CATAAGGCTG 
AAGAGGTACC 
TAGTGATCAA 
AGACCCTGAG 
CGCAAAGGCG 
GTACACCCAA 
GGTGAGGAAC 
GGATATCAAG 
TACATATATC 
AACTATGTAT 
CTTGATGAAC 
CTCAATTCAG 
AGTAGGAGTG 
TCCAGCATAT 
CACATTGGCA 
AATGCATACT 
ATTTCTACAC 
GAGGGTCAAA 
AGCAAGTGAT 
AACGGAGTCC 
GCAAGCCATG 
CAATGACAGA 
ACCCTCCATC 
ACCATCCACT 



CCGGATGTGA 
GAGTCTCCAG 
TTAGAGGTTG 
AACATGGAGG 
TCCAGGTTCG 
GGATTCAACA 
GTTACGGCCC 
CAAAGAAGGG 
AGGATTGCCG 
AGAACACCCG 
GTAGAGGCAG 
CCTGCTCTTG 
CTTTACCAGC 
AACAAQTTCA 
GAACTTGAAA 
TTTAGATTAG 
TCTGAACTCG 
ACTGAGGACA 
GGTGATCAAA 
CAGAGTCGAG 
GCGAGAGCTG 
AGCCAAGATC 
GCAGGAATCT 
AATCTTCTAG 
ATTGTTATAA 
CCCACGATTG 



300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
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GAGCCAATGG CAGAAGAGCA 66CACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTAT6GCAGC ATGGJT CAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTACG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATT7CAG 
CTAAAGC CGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 



1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2620 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
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CCAGTCGACC CAACTAOTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGACCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATTCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCOTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAOG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAOCA 
GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCCGA CACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAOGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCOC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
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CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGO GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAOCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACACCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATOACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAOATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGOAGATG ATATTOGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 



4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT7US97/16718 



- 166 - 



GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA AC CTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GAT CATC CAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAOGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
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AGGTGTTATC AGAAATCC6G GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 8100 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 8160 
AGCCCTTTGT CACG6GGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 8220 

CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 6340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAOAATCC 8460 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 8520 

GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 8560 

CGGTTCAOGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTOGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGOGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 
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AGGTTTTCCA ATGCTTAAGO QACACTAACT CACGOCTTGG CCTAGGCTCC GAATTGAGGG 9 $60 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT QCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 10200 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10380 

AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 10500 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 

ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 10680 

AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 11100 

GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 11160 
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ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACOAGACAGT CAOTGCATTT ATCACGACTG 11220 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 11340 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 11400 

ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 1224 0 

TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCGTGA 12300 

TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 12480 

AAGGCTTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 1254 0 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGC 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 12660 

ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 
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TAQAATCTAT GCGAGGCCAC CTTATTCGQC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 12780 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA OCCCCAAOTC OATCCTTOCG ATCTGCTGTT AGAATAGCAA 12960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTOTTGG 13020 

CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT OATCACTCCC ATCTCAACTT 13080 

CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAOGTACAT 13140 

CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 

CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 13260 

TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 13320 

TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 13380 

CCCGCAAGCT AGAGCTGAGO GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 13440 

CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 13500 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 13560 

CTATGATTGA CCTGOTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 13620 

TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 13680 

TCACTATCTA CTTGGGCCAO TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 13860 

GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 13920 

CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 14100 

GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 14160 

ATATCAAGGC AGAGGCTATG TTATCTCCAG CA6GATCTTC GTGGAACATA AATCCAATTA 14220 

TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 14280 
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OATTGAGAGT TGATCCAGGA TTCATTTTCG 
CAAAGATCGG CAGCAACAAC ATCTCAAATA 
ATGATGTTGC AAAATTGCTC AAAGATATCA 
GGGGCAATCT CGCCAATTAT GAAATCCATG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT 
AACTAAACAA GTGCTTCTAT AATAGTGGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT 
AGACCTTGCC TGACAAAGAT ACTATAGAGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA 
GGGATTTTGT TCAGGGATTT ATAAGTTATG 
TATACCCTAG ATACAGCAAC TTCATCTCTA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA 
GOACTTCACC TGGACTTATA GGTCACATCC 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG 
CTATAGAGCA GGTGCTGATC AATTGCGGGT 
AATTGATCCA CCATGATGTT GCCTCAGGGC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT 
CCAAGTCAGA GAAACAGATT ATTATGACGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA 
ACTAATTGGT TGAACTCCGG AACCCTAATC 



ACGCCCTCGC TGAGGTAAAT GTCAGTGAGC 14340 

TGAGCATCAA GGCTTTCAGA CCCCCACACG 14400 

ACACAAGCAA GCACAATCTT CCCATTTCAG 14460 

CTTTCCGCAG AATCGGGTTG AACTCATCTG . 14520 

TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 14580 

CTATGTTGAT CACTTATAAG GAGATACTTA 14640 

TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 14700 

TTGTCGAACA CAGAATGGGA GTAGGTAATA 14760 

AAGTCACGTG GGTAGGCAGT GTAGATTGCT 14820 

CTAGTGTGGG GTTTATCCAT TCAGATATAG 14880 

AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 14940 

TACTGGTGAT TAAGCTTATG CCTTTCAGCG 15000 

TAGGGTCTCA TTATAGAGAA GTGAACCTTG 15060 

CTGAATCTTA TTTGGTTATG ACAGATCTCA 15120 

TTAAGCAGCA GATAATTGAA TCATCTGTGA 15180 

TATCCATTAA GCAACTAAGC TGCATACAAG 15240 

ATATCAATCC TACTCTGAAA AAACTTACAC 15300 

TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AAGATGGATT GCTTAATTCT AT ACT CATCC 15420 

ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

ACAGAAAGTT GATAAATAAG TTTATCCAGA 15600 

TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 15720 

AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15780 

CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 
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TATATTAAAG AAAACTTTGA AAATACOAAG TTTCTATTCC CAGCTTTQTC TOOT 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNES S : 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 no 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 12 5 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lya Ser Gin Thr Hie Thr Cys His Arg Arg Arg His Thr 
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180 



185 



190 



Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lye Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 



Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 



405 



410 



415 



450 



455 



460 
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Lys Asp Lys Ala Lsu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lya Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
€10 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val lie Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
G45 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
«0 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 175 - 



Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr II© Ser Thr lie 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg lie Ala Ser 
755 760 755 

Leu Val Gin Gly Asp Aan Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lya Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
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1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe Hie Asp Asp Ser Lys Glu 
1025 "30 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 10B5 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lye Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 H30 H35 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
11*0 1145 iiso 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 H65 

Hia Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
H70 H75 H80 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 H90 H95 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lye Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Oln Gin Gly Net Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 136< 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
142 5 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cys Tyr Met Thr Tyr Lou Asp Leu Leu Leu Asn Olu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 160< 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro lie Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 168( 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 176C 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
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1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Aon Phe He Val Ser Aon He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asp Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu lie His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 
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LeU ^!« LSU A8P LeU Hl8 ° ln ABn 116 Phe Val L V* A«n Leu Ser Lye 
2"0 2135 2140 

Ser Glu Lys Gin lie lie Met Thr Gly Gly Leu Lye Arg Glu Trp VI 
2145 2150 2155 2160 

Phe Lye Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu lie Lys Asp 
2180 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 
(O) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 
TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 
TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTGAGGTT AATTGGAAAC CCGGATGTGA 
OCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAOTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGG GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAOAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
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TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAOA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC AACGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GGCCAGAACA ACATCCGCCT ACCATCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
GAGCCAATGG CAGAAGAOCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATOAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTACG TTTAT GATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
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GATACCGAOG 
GCTTCTGATG 
AGAGGCAACA 
GGTAGGGCCA 
TTTGGAACGG 
CCCTCGGAAC 
GCCGCACTGA 
AATAATGAAG 
AAAACAGCCT 
CTGCTGTTAT 
AGCATATCCA 
AAGGATCCCA 
GGCAGAGATT 
CTCCAAGGAA 
CTAAAGCCGA 
GCATCACGCA 
CGTTACCTGA 
CAGATGCTGA 
CCAGTCGACC 
GCCTCCCAAG 
AAGGGTCGAT 
TCAGAGTCAT 
TGCTGGGGGT 
TCCTGCCCTT 
CTGAGCTTGA 
ACAACACCCC 



GATATGCTAT 
TTGAAACTGC 
ACTTTCCGAA 
GCACTTCCGG 
AGATCGCGTC 
CATCAGGGCC 
TACAGGAGTG 
AAGGGGGAGA 
TGGCCAAAAT 
TGAAGGGAGA 
CCCTGGAAGG 
ACGACCCCAC 
CAGGCCGAGC 
TGACAAATGG 
TCGGGAAAAA 
GTGTAATCCG 
TGACTCTCCT 
TGAAGATAAT 
CAACTAGTAC 
GTCCACAATG 
CGCTCCGATA 
AGATCCTGGT 
TGTTGAGGAC 
AGGTGTTGGC 
CATAGTTGTT 
ACTAACTCTC 



CACTGACCGG 
AGAAGGAGGG 
GCTTGGGAAA 
GACACCCATT 
TTTATTGACA 
AGGTGCACCT 
GACACCCGAA 
CTATTATGAT 
ACACGAGGAT 
AGTTGAGTCA 
ACACCTCTCA 
TGCAGATGTC 
ACTGGCCGAA 
ACGGACCAGT 
GATGAGCTCA 
CTCCATTATA 
TGATGATATC 
AATGAAGTAG 
AACCTAAATC 
ACAGAGACCT 
CAACCCACCA 
CTAGGCGACA 
AGCGATTCCC 
AGATCCACAG 
AGACGTACAG 
CTCACACCTT 



GGATCTGCTC 
GAGATCCACG 
ACTCTCAATG 
AAAAAGGGCA 
GGTGGTGCAA 
GCGGGGAATG 
TCTGGTACCA 
GATGAGCTGT 
AATCAGAAGA 
ATTAAGAAGC 
AGCATCATGA 
GAAATCAATC 
GTTCTCAAGA 
TCCAGAGGAC 
GCCGTCGGGT 
AAATCCAGCC 
AAAGGAGCCA 
CTACAGCTCA 
CATTATAAAA 
ACGACTTCGA 
CCTACAGTGA 
GGAAGGATGA 
TAGGGCCTCC 
CAAAGCCCGA 
CAGGGCTCAA 
OGAGAAAGGT 



CCATCTCTAT 
AGCTCCTGAG 
TTCCTCCGCC 
CAGACGCGAG 
CCCAATGTGC 
TCCCCGAGTG 
CAATCTCCCC 
TCTCTGATGT 
TAATCTCCAA 
AGATCAACAG 
TCGCCATTCC 
CCGACTTGAA 
AACCCGTTGC 
AGCTGCTGAA 
TTGTTCCTGA 
GGCTAGAGGA 
ATGATCTTGC 
ACTTACCTGC 
AACTTAGGAG 
CAAGTCGGCA 
TGGCAGGCTG 
ATGCTTTATG 
AATCGGGCGA 
AAAACTCCTC 
TGAAAAACTG 
CCTAACAACA 



GGGGTTCAGG 
ACTCCAATCC 
CCCGGACCCC 
ATTAGCCTCA 
TCGAAAGTCA 
TGTGAGCAAT 
GAGATCCCAG 
CCAAGATATT 
GCTAGAATCA 
GCAAAATATC 
TGGACTTGGG 
ACCCATCATA 
CAGCCGACAA 
GGAATTTCAG 
CACCGGCCCT 
GGATCGGAAG 
CAAGTTCCAC 
CAACCCCATG 
CAAAGTGATT 
TGGGACATCA 
GTGCCCCAGG 
TACATGTTTC 
GCATTTGGGT 
AAAGAGGCCA 
GTGTTCTACA 
GGGAGTGTCT 
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TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
OGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCGCGAACA CCAOGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAOGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGTCCA 
CGGTGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAOCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
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CCCACCCCTA AAGOAGACAC C6GGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GQTCTCAAGG TOAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 5940 

GAGACAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 6000 

ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 6060 

CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 6120 

CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 6180 

ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 6240 

AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 6300 

AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 6360 

GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 6420 

CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 6480 

GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 6540 

TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 6600 

TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 6660 

ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 6720 

GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 6780 

TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 6840 

AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 6900 

CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 6960 
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OTOTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 7020 

AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 7080 

ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 7140 

CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 7200 

TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATCAAAACTT AGG GTGCAAG 7260 

ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 7320 

TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 7380 

TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 7440 

CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 7500 

TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTGAAGGAC GTGCTGACAC CACTCTTCAA 7560 

AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 7620 

AATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 7680 

TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 7740 

GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 7800 

CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 7860 

ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 7920 

ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 7980 

TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 8040 

AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 8100 

GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 8160 

AGCCCTTTGT CACGGGGAAG ATT CT ATCAC AATTCCCTAT CAGGGATGAG GGAAAGGTGT 8220 

CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CCCCTTATCA ACGOATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 8460 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTG AT CT 8520 
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GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TQATCACACA 85S0 

CGOTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTOTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGQAGG TGGATGGTGA TQTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA OATCTCCAAT ATGTTTTOGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCO CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TOCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AOTCACCCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGOTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAOT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGOACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTOG CCTAGGCTCC GAATTQAGGG 9660 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT QCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTOACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTOATGCT AGGTATACAO AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 
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TGGAGCCTCT TTCACTTGCT TACCTGCAGC TOAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAACTGAAG CTCTAGATTA CATTTTCATA ACTGATOACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GOATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTT CAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
OTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAOTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCC CAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGGAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
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ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTOOCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 12060 

TOAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 12300 

TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 12480 

AAGGCTTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 12660 

ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 12720 

TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 12780 

GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 

AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 

TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 12960 

CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13020 

CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13080 

CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 13140 

CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 189 - 



CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTOTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAOAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGG AC CAT AT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCOACC 
ATATCAAGGC AGAGGCTATG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC AT CTCAAAT A TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAA GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATT AG C ACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
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TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 



TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AOACCTTGCC TGACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATCTCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA ACAAAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAAT GGTAT A AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND ED NESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
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Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
1 5 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Pho Pro Ala Lau 
260 265 270 
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Gly Asn Pro Thr Tyr Gin Ho Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cya Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Thr Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Net Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 4i5 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 " 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 5io 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lye Glu Thr Gly Arg 
S30 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
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545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp lie Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Ser Lys Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser lie Ala Arg Cys Val Phe Trp Ser Glu Thr lie Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn lie Ala Xhr Thr Net Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

lie Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 



Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 104! 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 HOO 



980 
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Asn Tyr Asp Tyr Olu Gin Phe Arg Ala Gly Met Val Lau Leu Thr Gly 
1105 1110 1115 1121 

Arg Lya Arg Aan Val Leu lie Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro lie Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu lie Arg Arg His Glu Thr Cys Val lie Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 120( 

lie Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr lie Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Lau Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg lie Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro lie 
1265 1270 1275 128C 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 136C 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
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1380 1385 



1390 



Cya Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 140 0 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 "30 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
i445 1450 i45 5 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
I 490 1495 isoo 

Ser Qly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Qly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 155Q 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cya Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
I 570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 "SO 1595 isoo 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 16 i5 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
I 620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 I645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 
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Asp His Tyr Ser Cys Ser Lau Thr Tyr Leu Arg Arg Gly Ser lie Lye 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Aon 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He Hia Ser Asp He Glu Thr Leu Pro Asp Lys 
1875 1B80 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Qly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 
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Tyr Arg Glu yal Asn Leu v.l Tyr Pro Arg Tyr Ser Asn Phe He 3er 
1940 1950 

Thr Olu Ser Tyr Leu Val Met Thr Asp Leu Lys Al. Aan Arg Leu Met 

955 1960 1965 

Aan ProGlu Lys lie Lye Gl^Gln He II. olu Server V.l Arg Thr 



Ser Pro Oly Leu He Oly His He Leu Ser He Lya Gin Leu Ser Cy B 

1995 2000 



1985 1990 

Ha Gin Al. He v.l Gly Aep Ala Val Ser Arg gly ^ ^ ^ ^ 



2005 2010 



2015 



Thr Leu Lya Lys Leu Thr Pro lis Glu Gin Val Leu He Aan Cys Gly 
2020 2025 2030 

11%^ «*" LyB Glu Le » J1 « His Aap 

2035 2040 2045 

^ ?«o S ° r G1/ ° ln ABP ° ly LeU Leu ABn Ser He Leu Tyr 

2050 2055 2060 

Arg Olu Leu Al. Arg phe Lya Aap Aan Gin Arg Ser Gin Gin Gly Met 

2070 2075 2080 

Phe Hia Ala Tyr Pro V.l Leu Val Ser Ser Arg Gin Arg Gl» Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lya Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 



2110 



Asn Lys Lys Leu He Asn Lys Phe He Gin Asn Lsu Lys Ser Gly Tyr 
2H5 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe V.l Lys Aan Leu Ser Lya 

2135 2140 

Ser Glu Lya Gin He He Met Thr Gly oi y Leu Ly8 Arg Qlu 

2150 2155 2160 

Phe Lya Val Thr Val Lya Glu Thr Lya Glu Trp Tyr Lya Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lya Asp 
2180 

INFORMATION FOR SEQ ID NOsl3: 

(1) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15894 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDKDNKSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 13: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TTTTCTAGTG CACTTAGOAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAAGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCTCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 

AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 840 

GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 900 

GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 960 

GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 1020 

AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 1060 

GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 1140 

ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 1200 

GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 1260 
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GTATCACTOC 
AGATCAGTAG 
GTGAGAATGA 
GAGAAGCCAG 
CCCATCTTCC 
CGCAGGACAG 
CGGAAGAACA 
ACTAGGTGCG 
AAAACTTAGG 
GAGCCGATGG 
CTCAAGGCCG 
ATATCAGACA 
GGTCTCAGCA 
CGCGGTCAGG 
AATCTCCAGG 
GCGGTTAAGG 
AGCACCCTCT 
GATACCGAGG 
GCTTCTGATO 
AGAGGCAACA 
GGTAGGGCCA 
TTTGGAACGG 
CCCTCGGAAC 
GCCGCACTGA 
AATAATGAAG 
AAAACAGCCT 



CGAGGATGCA 
AGCGGTTGGA 
GCTACCGAGA 
GGAGAGCTAC 
AACCGOCACA 
TCGAAGGTCA 
AGGCTCAGAC 
AGAGGCCGAG 
AACCAGGTCC 
CAGAAGAGCA 
AGCCCATCGG 
ACCCAGGACA 
AACCATGCCT 
GACCTGGAGA 
CATCAAGCAC 
GAATCCAAGA 
CAGGAGGAGA 
GATATOCTAT 
TTGAAACTGC 
ACTTTCCGAA 
GCACTTCCGG 
AGATCGCGTC 
CATCAGGGCC 
TACAGGAGTG 
AAGGGGGAGA 
TGGCCAAAAT 



AGGCTTGTTT 
CCCAGACAAG 
TTGGGGGGCA 
AGAGAAACCG 
CCCCTAGACA 
GCTGACGCCC 
ACGGACACCC 
GGCCAGAACA 
ACACAGCCGC 
GGCACGCCAT 
CTCACTGGCC 
GGAGCGAGCC 
CTCAGCAATT 
GAGCGATGAC 
TGGGTTACAG 
TGCTGACTCT 
CAATGAATCT 
CACTGACCGO 
AGAAGGAGGG 
GCTTGGGAAA 
GACACCCATT 
TTTATTGACA 
AGGTGCACCT 
GACACCCGAA 
CTATTATGAT 
ACACGAGGAT 



CAGAGATTGC 
CCCAAGTATC 
AGGAAGATAG 
GGCCCAGCAG 
TTGACACTGC 
TGCTTAGGCT 
CTATAGTGTA 
ACATCCGCCT 
CAGCCCATCA 
GTCAAAAACG 
ATCGAGGAAG 
ACCTGCAGGG 
GGATCAACTG 
GACGCTGAAA 
TGTTATTATG 
ATCATGGTTC 
GAAAACAGCG 
GGATCTGCTC 
GAGATCCACG 
ACTCTCAATG 
AAAAAGGGCA 
GGTGGTGCAA 
GCGGGGAATG 
TCTGGTACCA 
GATGAGCTGT 
AATCAGAAGA 



AATGCATACT 
ATTTCTACAC 
GAGGGTCAAA 
AGCAAGTGAT 
ATCGGAGTCC 
GCAAGCCATG 
CAATGACAGA 
ACCCTCCATC 
ACCATCCACT 
GACTGGAATG 
CTATGGCAGC 
AAGAGAAGGC 
AAGGCGGTGC 
CTTTGGGAAT 
TTTATGATCA 
AATCAGGCCT 
ATGTGGATAT 
CCATCTCTAT 
AGCTCCTGAG 
TTCCTCCGCC 
CAGACGCGAG 
CCCAATGTGC 
TCCCCGAGTG 
CAATCTCCCC 
TCTCTGATGT 
TAATCTCCAA 



ACTGAGGACA 
GGTGATCAAA 
CAGAOTCGAG 
GCGAGAGCTG 
AGCCAAGATC 
GCAGGAATCT 
AATCTTCTAO 
ATTGTTATAA 
CCCACOATTG 
CATCCGGGCT 
ATGGTCAGAA 
AGGCAGTTCG 
ACCTCGCATC 
CCCCCCAAGA 
CAGCGGTGAA 
TGATOGTGAT 
TGGCGAACCT 
GGGGTTCAGG 
ACTCCAATCC 
TCCGGACCCC 
ATTAGCCTCA 
TCGAAAGTCA 
TGTGAGCAAT 
GAGATCCCAG 
CCAAGATATT 
GCTAGAATCA 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 
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CTGCTGTTAT TGAAGGGAGA A0TT6AGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GQAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA OCCOTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TOACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCOA CAAGTCGGCA TGGGACATCA 
AAGGGTTGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GGATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AAATCCACAG CAAAGCCCGA AAAACTCCTC AAA GAGGC CA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AGTGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATT ATT G CAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TAGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 



2880 
2940 
3000 
3060 
3120 
3160 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
3900 
3960 
4020 
4080 
4140 
4200 
4260 
4320 
4380 
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AAGAATTCCG CATTTACOAC GACGTQATCA TAAATOATGA CCAAGOACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAOCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA TACAAOGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4 BOO 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 4920 

CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCOGCTC 4980 

CCCCGGTGCC CACAGGCAGG GACACCAACC CCCOAACAOA CCCAGCACCC AACCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCOC S100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGOCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAOCCC CGATCCGGCG GOGAQCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCCCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCOACGA CACTCAACTC 5400 

CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 5460 

GGTCTCAAGG TGAACGTCTC TOCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 5520 

ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAOGGG TGGTAGGAAT AGGAAGTGCA 5580 

AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 5640 

ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 5700 

ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 5760 

CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAOTCCT GGCAGGTGCG 5820 

GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 5880 

CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT S940 
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GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTO TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCGAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTGCTCTGC TCCAAGAAT G CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGATGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTXGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGO CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 



6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
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TCTA6AT6TA ACTAACTCAA TCGAGCATCA GGTCAAOOAC GTOCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGO OCCTOAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAO TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGCC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGOAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC TACCTCTTCA ATGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGOTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 



7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 

8940 

9000 

9060 
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TGGGATOQTG GGCATGGGAG TCA6CT6CAC AGTCACCCGG OAAGATOGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTOAACCA ATCTCATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TOOTTTCTTC CCTQCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 10200 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10380 

AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 10500 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 



PCT/US97/16718 



- 206 - 



ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTOCG TTACOACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC OAGCTTTGAC CCATATGATG TGATAATOTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AOGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAOTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATT C CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTXAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
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CACAACAACC GGG GGACTCT TCATTCCTAG ACTGGQCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAOAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
OAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
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GACCATCAGG GAAATATCAO ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATQAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC AC CAATT C GA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCOACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG GAGGATCTTC GTGGAACATA AATCCAATTA 
TTOTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GOGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTOAG CCAGGGGAGG 
ACOGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGOA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCCCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCtGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
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CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 15540 

TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 15600 

ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG G TT T TTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 15780 

ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 

TAGATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNBSS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cya Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
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He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 HO 



Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
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Asn Val Arg Lys Tyx Met Asn Gin Pro Lye VaX lie Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly Hie Ala lie Phe Cys Gly He He lie Asn Gly Tyr 
385 390 395 400 

Arg Aep Arg Hie Gly Gly Ser Tip Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Aep Thr lie Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Pas Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu lie Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val lie Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lye Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 
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Asp Gin Asp Thr Asp His Pro Olu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe lie Thr Thr Asp Leu Lys Lye Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
70S 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
B85 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
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Arg Met Ala Leu Leu Pro Ala Pro lie Oly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn lie Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

lie Ala Asp Leu Lys Arg Met lie Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

lie Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 
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lie Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Oly Ser Thr 
1205 1210 1215 

Thr Asp <31u Arg Thr Asp Met Lye Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Oly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Oly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 * 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 
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Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1460 1485 

Gin Cye Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 152< 

Met Ser Lya Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 160C 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 



Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro lie lie Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 168( 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys lie Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro lie Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu lie His Ala Phe Arg Arg lie Gly Leu Asn 
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1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Net Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lye Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1665 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 



Ser Pro Gly Leu He Gly His He 
1985 1990 

He Gin Ala He Val Gly Asp Ala 
2005 

Thr Leu Lys Lys Leu Thr Pro He 
2020 



Leu Ser He Lys Gin Leu Ser Cys 
1995 2000 

Val Ser Arg Gly Asp He Asn Pro 
2010 2015 

Glu Gin Val Leu He Asn Cys Gly 
2025 2030 
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Lou Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 



Tyr Ser Ala Leu He Lys Asp 
21B0 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : RNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ACCAAACAAA GTTGGGTAAG OATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 
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GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CAAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCO 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACT AG GTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 



240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
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AAAACTTAGG AACCAGGTCC ACACAOCCGC CAGCCCATCA ACCATCCACT CCCACOATTO 
0AGCC6ATG0 CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGG6TTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTAT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTA TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CCAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 



1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
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2580 
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2820 
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3180 
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3300 
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CAQATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCOACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGG6TC6AT C3CTCCGATA CAACCGACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGO 
TCAOAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTCTC 
TGCTGGGOGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTOATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCOTC TTTCOOATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAOAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAOGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAOTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCQAAAGAC TCCACTGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTOA TACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC OTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGOTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
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ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AOAACCCAOA CCCCGGCCCA 
CGGCGCCOCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCT AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
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CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TAACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATTATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTTGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA 6AGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
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TAATCTGAGC AGCAAAAGGT CAOAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 8040 

AGGTGTTATC AGAAATCCGG GTTTGGGOGC TCCGGTGTTC CATATGACAA ACTATCTTOA 8100 

GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 8160 

AGCCCTTTGT CACCGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 8220 

CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 8280 

CACCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 8340 

TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 8400 

AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 8460 

CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 8520 

GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 8580 

CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

ACCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA ATGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACAXGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 9120 

ATAGGGCTGC TAGTGAACTA ATCTCATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 
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CGAGGAAGAT CCOTGAACTC CTCAAAAAGG GOAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 9780 

CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATOATGTTCT TGACCAAAAC GGGTTTTCTG 10200 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 10380 

AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 1OS0O 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 

ACCTAAAGGA CAAGGCACTT OCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 10680 

AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCIGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 11100 
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GGAACGTGAG AGCA6CAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 11160 

ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 11220 

ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 11280 

TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 11340 

CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 11400 

ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 11460 

GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 11520 

GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 11580 

TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 11640 

ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 11700 

CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 11760 

TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 11820 

AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGOTT 11880 

ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 11940 

CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 12000 

ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 12060 

TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 12120 

ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 12180 

CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 12240 

TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 12300 

TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 12360 

AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 12420 

TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 12480 

AAGGCCTGAT TCGAGCGAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 12540 

TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 12600 

GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 12660 
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ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTOAQGTC CCTGATGTAC 12720 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TOCGAGTGTG 12780 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 12840 
A GO AAA CATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 12900 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 12960 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 13020 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 13080 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AQTGAAATAC TCAGGTACAT 13140 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 13200 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 13260 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 13320 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG AXACCCAGCT 13380 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 13440 
CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 13500 

AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAOTCC ACAGCACTAT 13560 

CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 13620 

TAGCGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 13680 

TCACTATCTA CTTOGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 13740 

QACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 13800 

AAGGAGTGTT TAAGGTOCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 13860 

GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 13920 

CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 13980 

AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 14040 

GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TCTCAACCAG 14100 

GGGCCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTOCAGTT CTAACCGACC 14160 

ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 14220 
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TTOTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AG6ATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTTA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAOACA ACCGAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGOTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 



14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 

14820 

14880 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360 

15420 

15480 

15540 

15600 

15660 

15720 

15780 
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ACTAATTOAT TOAACTCC6G AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 



(2) INFORMATION FOR SBQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 
<C) STRANDEDNKSS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
1 5 10 is 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cya Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 no 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 



TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 



15894 



165 



170 



175 
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Ser VaX He Lye Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Oly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
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450 



455 



460 



Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Olu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Aen Asp Ser Ser Phe Asp Pro Tyr Asp 
500 5 05 510 

Val lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 

535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 



550 



555 



560 



Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 



575 



Met Ala Lys Asp Glu His Asp Leu Thr 
580 



Lys Ala Leu His Thr Leu Ala 
585 590 



Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 6 oo 



605 



Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr 



610 



615 



620 



Arg Asn 



Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 «40 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 



665 



670 



Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Aen Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lye Arg Leu Glu Thr Ser Val 
690 M5 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 



710 



715 



720 



Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 
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Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr lie Ser Thr lie 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg lie Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr lie Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lye Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
B05 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lye Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 90S 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Lou Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Lou Val Cys Val Gin Ser 
995 1000 1005 
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He Thr** Leu Leu L y » AenXle Thr Ala ^ phe ^ ^ ^ ^ 

1015 X020 

l«5 Pr ° M<>t ^ 0 Ly8 «* ^3 Asp Asp ser Lys Olu 

1030 10 " 1040 

Glu Asp Olu 01y leu ^ la Ala phe ^ Aflp ^ we ^ 



1050 



1055 



^ U " *" S »" V.! T te 0ly „. 

1065 1070 



«. ~ II. «. aly « ^ ; Thc ^ Ly „ My leu n . u> 

1080 1085 

sar ^o** 9 Lys oly oly J!" Th * a « ^ v.i Ile Thr Arg Leu ser 

1095 1100 

Aan Tyr Asp Tyr Olu Gin Phe Arg Ala Glv M«t- r 
1105 1110 g Gly Met Val Leu Leu Thr Gly 



1115 



1120 



Arg Lys Arg Asn Va^Leu He Asp Lye olu Ser Cys Ser Val Gin Leu 



1130 



1135 



Al. Arg Ma Leu at. Ser Hie Met Trp Ala Arg Leu Ala Arg aly Arg 

i145 1150 

"* US," 1 " "° ° 1U ~ ™ — <»» »...«.<: Ar 9 „ y 



1165 

His Leu^Xle Arg Arg His Olu Thr Cys Val Xle Cys Olu Cys Oly Ser 

l 175 1180 
Val^Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin ,eu Asp Aap 



11! >5 1200 



He Aap Lys Glu Thr 
1205 



Ser Ser Leu Arg Val 
1210 



Pro Tyr He Gly Ser Thr 
1215 



Thr Asp Olu ArgThr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 

1225 1230 

Arg ser LeuArg Ser Ala Val Ar g Xle Ala Thr Val Tyr Ser Trp Ala 

1240 1245 

Tyr OlyAsp Asp Asp Ser Ser Trp Asn olu Ala Trp Leu Leu Ala Arg 

1255 1260 

OlnArg Ala Asn Val Ser Leu Olu Olu Leu Arg Val Xle Thr Pro Xle 

" 70 "75 a280 
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1285 1290 1295 

Val Lye Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Qln Gin Gly Net Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 



Glu Thr Leu Phe Arg Leu Glu Lys 
1345 1350 

Leu His Leu His Val Glu Thr Asp 
1365 

His Pro Arg lie Pro Ser Ser Arg 
1380 



Asp, Thr Gly Ser Ser Asn Thr Val 
1355 1360 

Cys Cys Val He Pro Met He Asp 
1370 1375 

Lys Leu Glu Leu Arg Ala Glu Leu 
1385 1390 



Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cye Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cye Glu Ser Asp Glu Asp Val Val 
"85 1590 1595 1600 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lye His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Ala Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cye Ala Val Leu Thr Asp His He Lye Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp Hia Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 
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Gly Asn He Val Lys Val Leu Phe Asn Oly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He Hla Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 192C 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Tyr 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Arg Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly Hie He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
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2115 2120 2125 

Leu lie Leu Asp Leu His Oln Asn lie Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Olu Lys Gin He He Met Thr Oly Oly Leu Lys Arg Olu Trp Val 
2145 2150 2155 " 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Oly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SSQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 17: 

ACCAAACAAG AGAAGAAACT TGTCTGGGAA TATAAATTTA ACTTTAAATT AACTTAGGAT 60 

TAAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 120 

TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGOAGCTA 180 

TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 240 

ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 3 00 

AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGG CTTAT GCCAATCCAG 360 

AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGTCAA GTATGTCATA TACATGATTG 420 

AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 480 

ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 540 

TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 600 

CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 660 

TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGOAACAG 720 
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TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAQTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAATCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 
CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA T CATC CAT AA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAOACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AG ACT CAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCCGG 
GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAT ATTGATCAGG AAACTGTACA 
GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 
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TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTOAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAOATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 
ACGAGTTGTA TGTGTAGCAA ATG TACT AAA CAATGTAGAT ACTGCATCAA AGATAGATTT 
CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 
AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 
GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 
TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 
CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 
ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 
CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 
AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 
TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 
AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 
CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 
ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 
AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 
CATTCCCAGA AT CAT C ATTC TCTGAAAATG GT CAT AT AGA ACCATTACCA CTCAAAGTCA 
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ATGAACAGAG GAAAOCAGTA CCCCACATTA QAGTTGCCAA GATCGGAAAT CCACCAAAAC 
ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 
ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 
GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 
CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 
CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 
TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 
AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 
CTAAGTCAAT GQCATCACTA TCTCTACCCA ACACAATATC AATCAATCTG CAGGTACACA 
TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 
AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 
ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 
TAGTTGGAGG AATCAGTCTT CATGTCAATG CAACTGGGTC CATATCAAAA ACACTAGCAA 
GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 
ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 
CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 
AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 
AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 
TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 
ACAOAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 
GAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 
ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGCCTC ATACCAAAAA TAGAAOACTC TAACTCTTGT GGTGACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGGATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATC CAA TGAAAACACT GATCCCAGAA CAAAACGATT 
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CTTTGGAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 5460 

GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 5520 

AATTAGGGAC ACAAACAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAQT 5580 

AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAOGCT 5640 

AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 5700 

AACAAACATA TTTGGTGATA ACATAGGATC GTTACAAGAA AAAGGAATAA AATTACAAGG 5760 

TATAGCATCA TTATACCGCA CAAATATCAC AQAAATATTC ACAACATCAA CAOTTGATAA 5820 

ATATGATATC TATOATCTGT TATTTACAGA ATCAATAAAG GTGAGAGTTA TAGATGTTGA 5880 

CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 5940 

CACTCAGATC TACAAAGTAG ATTCCATATC ATATAACATC CAAAACAGAG AATGGTATAT 6000 

CCCTCTTCCC AGCCATATCA TGACGAAAGO GOCATTTCTA GGTGGAGCAG ACGTCAAAGA 6060 

ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 6120 

TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGGTCACATC 6180 

AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 6240 

CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 6300 

AATTATAACA CATAAAGAAT GTAGTACAAT AGGTATCAAC GGAATGCTGT TCAATACAAA 6360 

TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTGC 6420 

ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 6480 

AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGOAAATT GGCATCAATC 6540 

TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 6600 

GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 6660 

TGACAAGCCA TATQTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 6720 

TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 6780 

CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 6840 

TGCTGGTAAT OAGCTGOAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 6900 

AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 6960 
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AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGTTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 
ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
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TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAOTA 8580 

ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 8640 

GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 8700 

ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 8760 

CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 8820 

TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 8880 

AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 8940 

TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 9000 

GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 9060 

TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 9120 

AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 9180 

TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 9240 

AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 9300 

TGATATTAGA TAAACAAAAC TATAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 9360 

ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATC CAAAAT 9420 

TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 9480 

TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 9540 

TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 9600 

AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 9660 

TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 9720 

CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 9780 

OAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 9840 

TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 9900 

TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 9960 

TATCATATGA AAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 10020 

TCATAGAGCC TCAGT TAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 10080 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 



PCT/US97/16718 



- 243 - 



CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 



10140 



CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 10200 

ATCAGATATT GOATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 10260 

CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 10320 

ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 10380 

TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 10440 

TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AGCCATACAG 10500 

ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 10560 

AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 10620 

TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 10680 

TTGGAGAAAC TTGCAACCAA AT ATT TG GAT TAAATAAATT GTTTAATTGG TTACACCCTC 10740 

GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 10800 

ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 10860 

TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 10920 

CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 10980 

TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 11040 

ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 11100 

AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 11160 

ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 11220 

CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 11280 

TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 11340 

AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 11400 

AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 11460 

GATTCAATTA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 11520 

CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 115 BO 

ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT GGACTGGGCT TCAGATCCAT 11640 
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ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 
TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 
AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 
CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 
CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 
TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 
TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 
TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT CATAACAATG TCCAATGATA 
ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 
TGTTAACAGG AT TAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 
ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 
ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 
TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 
ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 
CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
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CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATACTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTOAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
GATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 
TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 
TATATCTAAT TTCOAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 
CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 
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GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG 
ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA 
GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT 
AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT 
GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT 
TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC 
TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA 
GAOAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA 
AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA 
AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT 
TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA 
TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG 
(2) INFORMATION FOR SKQ ID NO: IB: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



ATTTCAAATC 

TATCAACAAT 

TAATATAACT 

GAAAAATAAG 

ATCATTATCG 

ACAGACTGGA 

AAACATCATT 

AGAAGTTAAA 

ATATAAAGAA 

CGATTAAAAC 

GACAAAAAGT 

GT 



AATTTAAATC 

ATAATCCAAA 

CATGATGATA 

GGAAAGTTAA 

ACTCGATTAC 

TATGTATCAT 

AAGAATTACA 

ATACTTATGA 

CCCGAAGACC 

ATAAATACAA 

AAGAAAAACA 



14820 

14880 

14940 

15000 

15060 

15120 

15180 

15240 

15300 

15360 

15420 

15462 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 18: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp He Leu Tyr Pro 
1 5 10 is 

Glu Cys His Leu Asn Ser Pro He Val Lys Gly Lys He Ala Gin Leu 
20 25 30 

His Thr He Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

He Leu Val Ho Thr Arg Gin Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 
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Arg Gin Arg Ser lie Arg Arg Leu Lye Leu lie Leu Thr Glu Lys Val 
65 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe lie Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr lie Pro Gly lie Asn Ser Lys Val Thr Glu 
100 105 no 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu He Asn Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
160 185 190 

He Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 260 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys He Leu Asp He Phe Asn Lys Ser Thr He Asp 
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340 345 350 

Glu He Ala Glu Ho Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lye Val Arg Lya Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 
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Asn Lys lie Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lye Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin lie Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly He Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 
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He Gin Gin Leu Tyr He Ale Leu Oly Met Asn He Asn Pro Thr He 
900 90S 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ale Ser Val Gly Gly Phe Asn Tyr Met Ale 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 9 6 o 

Leu Ala Asp He Lye Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
9«5 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Leu 
980 985 99 0 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 10 05 

lie Thr Thr Met II. Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 102 0 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 "30 103 5 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
104 5 1050 10S5 

Pro Arg Val Ale His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ale He Ale Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 X085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1Q 90 1095 noo 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 

1105 "io ins 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
H25 H3o 1135 

Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
11*0 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 i 165 

Val Val He Thr Gly Ser Glu His Cys Lys He Cys Tyr Ser Ser Asp 
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1170 



1175 



1180 



Gly Thr Aen Pro Tyr Thr Trp Mot Tyr Lou Pro Gly Asn II© Lys Ilo 
1185 1190 1195 120C 

Gly Sor Ala Glu Thr Gly Ilo Ser Ser Lou Arg Val Pro Tyr Pho Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr Ilo Lys Aon 
1220 1225 1230 

Lou Ser Lys Pro Ala Lys Ala Ala lie Arg Ilo Ala Mot Ilo Tyr Thr 
1235 1240 1245 

Trp Ala Pho Gly Asn Asp Glu Ilo Sor Trp Mot Glu Ala Ser Gin lie 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys Ilo Leu Thr 
1265 1270 1275 128C 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lys Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Sor He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Mot Lou Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1361 

lie Val Met His Leu His He Glu Asp Glu Cys Cys Ho Lys Glu Sor 
1365 1370 1375 

Pho Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser Ilo 
1425 1430 1435 1441 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 



1445 



1450 



1455 
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Aen Leu Lye Glu lle He V.1 He Ale Aan Aep Asp Asp He Aen Ser 
1460 "65 1470 

Leu lie TtoOlu Phe Leu Thr LeuAep He Leu Vel Phe Leu Lye Thr 

Phe GlyGly Leu Leu Vel Aenttn Phe Ala Xyr TtoLeu Tyr Ser Leu 

Lye He Olu Gly Arg »,p Uu He Trp Aep Tyr He Met Arg Thr Leu 

1510 "15 1520 



Arg Aep Thr Ser Hie Ser He Leu 



Lye Val Leu Ser Aen Ala Leu Ser 



1525 "30 1S35 

Hie Pro Lye Vel Phe Lye Arg Phe Trp Aep Cys Gly Vel Leu Aen Pro 
1540 1550 

He Tyr Gly Pro Aen Thr Ale Ser Gin Aep Gin He Lye Leu Ale Leu 
"55 1560 15$5 

Ser He eye Glu Tyr Ser Leu Aep Leu Phe Met Arg Glu Trp Leu Aen 

15 '5 1580 

Qly Vel ser Leu Glu He Tyr He Cye Aep Ser Aep Met Glu Vel Ale 

1590 "95 lo00 

Aen Aep Arg Lye Gin Ale Phe He Ser Arg Hie Leu Ser Phe Vel Cye 
1605 "" 1615 

Cys Leu Ale Glu lie Ale Ser Phe Gly Pro Aen Leu Leu Aen Leu Thr 
1620 "" X630 

Tyr Leu Glu Arg Leu Aep Leu Leu Lye Gin Tyr Leu Glu Leu Aen He 

1635 "40 1(545 

Lye OluAep Pro Thr Leu Lye^Tyr Vel Gin He Ser Gly Leu Leu He 



1660 

«65 Ser ^ ^ 0 Val Thr *»» Val ^e He Lye 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Vel lie Aep Aep Trp 



1680 



"«5 1690 



1695 



Aep Pro Vel Glu Aep Glu Aen Met Leu Aep Aen He Vel Lye Thr He 
1700 "05 1710 

Aen Aep Aan Cys Aen Lye Aep Aen Lye Gly Aen Lye He Aen Aen Phe 
1715 1720 1725 
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Trp Gly Leu Ala Leu Lyo Asn Tyr Gin Val Leu Lye lie Arg Ser lie 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 176< 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly lie Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn He Thr Asp Val lie Gly 
1825 1830 1835 184C 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lye 



Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1670 

Asn Gly Asn Pro Asn Ser Thr Trp He Gly Asn Met Glu Cys Glu Ser 

1875 I860 1885 

Leu He Trp Ser Glu Leu Asn Asp Lys Ser He Gly Leu Val His Cys 



Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 192( 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 I960 1965 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 
1970 1975 1980 

Asp Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 1990 1995 200( 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 



1845 



1850 



1855 



1890 



1895 



1900 
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2005 2010 2015 

lie Leu Ser Lys Lys Arg Asn Asn Olu Trp Leu Hia His Glu lie Lye 
2020 2025 2030 

Olu Qly Olu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 

Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2 "5 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNE S3 : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT AACTTAGGAT 
TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 
TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 
TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 
ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 
AACATGCACA AAGGGCAGGG TTCTTGOTGT CTTTATTGTC AATGOCTTAT GCCAATCCAG 
AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGCCAA GTATGTCATA TACATGATTG 
AGAAAGATCT AAAACGGCAA AAGTATOGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 
ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAOAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAAGCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
84 0 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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CAOAACAATT CGAACATAOA GCAGATCAAG AACAAAATOG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATT TGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 
GT CAT CACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA 
GAG AGO AC CT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 
TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 



1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
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ACGAGTTGTA TGTGTAGCAA ATQTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 3000 

CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 3060 

AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 3120 

GATAGAAAAT CAAAOAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 3180 

TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 3240 

CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 3300 

ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 3360 

CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 3420 

AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 3480 

TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 3540 

AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 3600 

CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 3660 

ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 3720 

AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 3780 

CATTCCCAGA ATCATCATTC TCTGAAAATG GTCATATAGA ACCATTACCA CTCAAAGTCA 3840 

ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 3900 

ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 3960 

ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 4020 

GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 4080 

CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 4140 

CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 4200 

TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 4260 

AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 4320 

CTAAGTCAAT GGCATCACTA TCTCTAACCA ACACAATATC AATCAATCTG CAGGTACACA 4380 

TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 4440 

AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 4500 
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ACTCTGTTQA ATACTGTAAA CAOAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 
TAGTTGGAGO AATCAGTCTT CATGTCAATG CAACTGGGTC CATATCAAAA ACACTAGCAA 
GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 
ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 
CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 
AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 
AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 
TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAOGT 
ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 
OAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 
ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGCCTC ATACCAAAAA TAGAAGACTC TAACTCTTGT GGTGACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGGATA 0 AC TO AT CAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATCCAA TGAAAACACT GATCCCAGAA CAAAACGATT 
CTTTGOAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 
GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 
AATTAGGGAC ACAAATAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAGT 
AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAGGCT 
AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 
AACAAACATA TTTGGTGATA ACATAGGATC GTTACAAGAA AAAGGAATAA AATTACAAGG 
TATAGCATCA TTATACCGCA CAAATATCAC AGAAATATTC ACAACATCAA CAGTTGATAA 
ATATGATATC TATGATCTGT TATTTACAGA ATCAATAAAG GTGAGAGTTA TAGATGTTGA 
CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 
CACT CAGAT C TACAAAGTAG ATTCCATATC ATATAACATC CAAAA C AG AG AATGGTATAT 
CCCTCTTCCC AGCCATATCA TGACGAAAGG GGCATTTCTA GGTGGAGCAG ACGTCAAAGA 
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ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGOTCACATC 
AGACATTGTT CCAAOATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAGT AGGTATCAAC GG AATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTAC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 
TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TOAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAOTGCT 
AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTG CAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT OTTCAACTCC 
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CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTQ TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGOTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 
ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAOTA 
ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 
GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 
ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 
CTCAGCCTTA TGATATOGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 
TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 
AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 
TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 
GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 
TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 
AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 
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TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAOAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 
CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 
TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 
TATCATATGA GAATGCTGTT GAT T ATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 
TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 
CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 
CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 
ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 
CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 
ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 
TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 
TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AGCCATACAG 
ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 
AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 
TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATOAATCA ACAGCTCTAT 
TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 
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GTCTTGAAGG AAGTACAATC TATGTAQGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 



ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 
TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 
CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 
TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 
ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATQGA TGATCTAGGT CATGAACTTA 
AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 
ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 
CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 
TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 
AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 
AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 
GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 
CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 
ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAGATCCAT 
ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAOAC CTTGCCATAG 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 
TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 
AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 
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CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 12360 

CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 12420 

TGGAAGCCTC ACAOATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 12460 

TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 12540 

TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 12600 

ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 12660 

TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 12720 

ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 12780 

ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 12840 

TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 12900 

ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 12960 

CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 13020 

AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 13080 

CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTG CAT 13140 

ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 13200 

CACT GAG AGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 13260 

AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATATTG 13320 

CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 13380 

TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 13440 

TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 13500 

CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 13560 

TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 13620 

TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 13680 

TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 13740 

TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 13800 

ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 13860 
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TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 13920 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 13980 
TCAACAGCAC TAGTTQTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 14040 
ATAAAGACAA GOACAGOCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 14100 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 14160 
TTGGTCAACO AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT AAAAAATTAG 14220 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 14280 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 14340 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 14400 

ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG OGATGATGAT GTTGTTTTAG 14460 

TTTCCAAAAT TATACCTACA ATCACTCCOA ATTGGTCTAG AATACTTTAT CTATATAAAT 14520 

TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 14580 

TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 14640 

CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 14700 

CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 14760 

GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC 14820 

ATCTGGCOAA AGAATTTTTA TCAACCCCAG ATCTGACTAA TATCAACAAT ATAATCCAAA 14880 

GTTTTCAGCG AACAATAAAG GATGTTTTAT TTOAATOGAT TAATATAACT CATGATGATA 14940 

AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT GAAAAATAAG GGAAAGTTAA 15000 

GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT ATCATTATCG ACTCGATTAC 15060 

TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC ACAGACTGGA TATGTATCAT 15120 

TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA AAACATCATT AAGAATTACA 15180 

GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA AGAAGTTAAA ATACTTATGA 15240 

AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA ATATAAAGAA CCCGAAGACC 15300 

AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT CGATTAAAAC ATAAATACAA 15360 

TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA GACAAAAAGT AAGAAAAACA 15420 
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TGTAATATAT ATATACCAAA CAGAOTTCTT CTCTTGTTTG QT 



15462 



(2) INFORMATION FOR SBQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 20: 

Met Aap Thr Glu Ser Asn Aan Gly Thr Val Ser Asp lie Leu Tyr Pro 
15 10 is 

Glu Cys His Leu Asn Ser Pro lie Val Lys Gly Lys lie Ala Gin Leu 
20 25 30 

His Thr He Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

He Leu Val He Thr Arg Gin Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Gin Arg Ser He Arg Arg Leu Lys Leu He Leu Thr Glu Lys Val 
«5 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Glu Met Ser Lys 
65 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Glu 
100 105 HO 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu He Asn Asn He Ser Lys Val 
"5 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lye Ala Arg Asn Glu 
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180 



185 



190 



lie Thr Phe Asn Val Oly Lys Asp Tyr Asn Leu Leu Olu Asp Gin LyB 
195 200 205 

Asn Phe Leu Leu lie His Pro Olu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys lie Leu Asp He Phe Asn Lys Ser Thr He Asp 
340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 
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Asp Glu Asp Leu Thr lie Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe lie Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin lie Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn lie Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lye Thr Tyr 
610 615 620 

Asn Lys He Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin He Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 



725 



730 



735 
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Oly Gly He Glu Gly Phe Cys Gin Lya Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 * 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
300 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 
530 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Phe 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 269 - 



1010 1015 1020 

Ser Pro Aan Pro Leu Leu Ser Oly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lya Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lye He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 

Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 



Val Val He Thr Gly Ser Glu His 
1170 1175 

Gly Thr Asn Pro Tyr Thr Trp Met 
1185 H90 

Gly Ser Ala Glu Thr Gly He Ser 
1205 



Cys Lys He Cys Tyr Ser Ser Asp 
1180 

Tyr Leu Pro Gly Asn He Lys He 
1195 1200 

Ser Leu Arg Val Pro Tyr Phe Gly 
1210 1215 



Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr He Lys Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lys Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 
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Thr Gin Met Lys Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu lie Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
"30 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lye Glu Thr Thr Gly His Asn Pro 
1345 "50 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 1500 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
1505 1510 1515 1520 

Arg Asp Thr Ser His Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 155b 

He Tyr Gly Pro Asn He Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 
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Ser lie Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 

Oly Val Ser Leu Glu He Tyr He Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 1630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin lie Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 
1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys He Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys He Arg Ser He 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly He Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn He Thr Asp Val He Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
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1645 1850 



1855 



Lye Lou Gly Aan Val Thr Oln lie Leu Aan Arg Val Lys Val Leu Phe 
I860 1865 1870 

Asn Gly Aan Pro Aan Ser Thr Trp lie Gly Aan Met Glu Cys Glu Ser 
1875 i860 1885 

Leu He Trp Ser Glu Leu Asn Asp Lya Ser He Gly Leu Val Hie Cya 
1890 1895 isoo 

Asp Met Glu Gly Ala He Gly Lya Ser Glu Glu Thr Val Leu Hia Glu 
1905 19 " 1915 1920 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Aap Asp Asp Val 
1325 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Aan Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 I960 i9 6 5 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lya 
1970 1975 1980 

Aap Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lya 
1985 1"0 1995 2000 

Leu Lya Arg Leu Ser Leu Leu Glu Glu Asn Aan Leu Leu Lya Trp He 
2005 2010 2015 

He Leu Ser Lya Lya Arg Aan Aan Glu Trp Leu Hia Hia Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Aan Leu Asn His Leu Ala Lya Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Aan He Aan Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Aap Aap Lya Arg Hia Lya Leu Gly Gly Arg Tyr Aan He Phe Pro Leu 
2100 2105 2110 

Lya Aan Lya Gly Lya Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2H5 2120 2125 
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Trp lie Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lye Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
21 « 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn lie lie Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 



(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 



ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT 


AACTTAGGAT 


60 


TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA 


ATGTTGAGCC 


120 


TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC 


GGTGGAGCTA 


160 


TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA 


ATAACTGATG 


240 


ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT 


AATGAGAAAC 


300 


AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT 


GCCAATCCAG 


360 


AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGCCAA GTATGTCATA 


TACATGATTG 


420 


AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA 


GAGATGATAT 


480 
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ATGAAAAGAC AACTGATTGG ATATTTGGAA GTOACCTOGA TTATGATCAG QAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAOAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGOTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAOCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACG GGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAAGCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 
CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AG ACT CAA CG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAOAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA G AT AAAT CAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 



540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
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GTCATCACAC 



OAATQTACAA 



CAOAAGCAAA 



AGATAGAAAC 



ATTGATCAGG 



AAACTGTACA 



2100 



GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 2160 

AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 2220 

CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 2280 

TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 2340 

TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 2400 

TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GG ACAAAGAA 2460 

AAGTT CTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 2520 

CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 2580 

ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 2640 

AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 2700 

CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 2760 

AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 2820 

AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 2880 

TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 2940 

ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 3000 

CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 3060 

AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 3120 

GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 3180 

TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 3240 

CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 3300 

ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 3360 

CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 3420 

AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 3480 

TCTCTCACAA AG C A CAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 3540 

AOAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 3600 
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CAAAGAAACG 
ACACAAAATC 
AGGATTAAAG 
CATTCCCAGA 
ATGAACAGAG 
ACGGATCCCG 
ACAAATACGG 
GATCATTACC 
CAACCAAACT 
CGGTACAAAA 
TGTTCGATGC 
AATTTAGAGT 
CTAAGTCAAT 
TAAAAACAGG 
AAAAATCACT 
ACTCTGTTGA 
TAGTTGGAGG 
GTCAGCTGGT 
ATCTAGTTAT 
CTTTACCTGG 
AACAATGGAA 
AAGGATAATC 
TCGCAAGAAT 
ACAGAACACC 
GAGACCGGCA 
AACCATGATC 



ACACCGAACA 
AAGCAGAATG 
AATAAATTAA 
ATCATCATTC 
GAAAGCAGTA 
GTATTTAGAT 
GAGTGTGAAT 
AATCGGATTG 
GGATATAGAA 
TATAAAACCA 
CAACAAAGTT 
AATCTTCGTG 
GGCATCACTA 
GGTTCAGACT 
GAATTTCATG 
ATACTGTAAA 
AATCAGTCTT 
ATTCAAAAGA 
CTGGGCTTCA 
CGAGTTCAGA 
CTAGTAATCT 
AAAAACTTAG 
AAGAGAGAAG 
AGAACAACAA 
ACACAACAAG 
ATGGCATCTT 



AACAGACAAG 
AAACAACAGA 
TCCTTGTCCA 
TCTGAAAATG 
CCCCACATTA 
GTCTTCTTAC 
GATCTCGACA 
GCTAAGTACA 
GTGAGAAGAA 
GAACTGTACC 
GCTCTTGCTC 
AATTGTACGG 
TCTCTAACCA 
GATTCTAAAG 
GTCCATCTCG 
CAGAAAATCG 
CATGTCAATG 
GAGATTTGTT 
TCAGTAGAGA 
TACTATCCTA 
CTATTTTAGT 
GACAAAAGAG 
GGACCAAAAA 
AATCAAAACA 
CACTGAACAC 
TCTGCCAAAT 



AAACAACAGT AGATCAAAAC 
TATCAATCAA TATACAAATA 
AAATGAGTAT AACTAACTCT 
GTCATATAGA ACCATTACCA 
GAGTTGCCAA GATCGOAAAT 
TCGGCTTCTT CGAGATGGAA 
GTGACCCGAG TTACAAAGTT 
CTGGGAATGA CCAGGAATTG 
CAGTCAAAGC GAAAGAGATG 
CATGGTCCAA TAGACTAAGA 
CTCAATGTCT TCCACTAGAT 
CAATTGGATC AATAACCTTG 
ACACAATATC AATCAATCTG 
GGATAGTTCA AATTTTGGAT 
GATTGATCAA AAGAAAAGTA 
AGAAAATGAG ATTGATATTT 
CAACTGGGTC CATATCAAAA 
ATCCTTTAAT GGATCTAAAT 
TTACAAGAGT GGATGCAATT 
ATATTATTGC AAAAGGAGTT 
CCGGACGTAT CTATTAAGCC 
GTCAATACCA ACAACTATTA 
AGTCAAATAG GAGAAATCAA 
TCCAACTCAC TCAAAACAAA 
AATGCCAACT TCAATACTGC 
AGATATCACA AAACTACAGC 



CTGTCAACAC 
AGAAAAACTT 
GCAATATACA 
CTCAAAGTCA 
CCACCAAAAC 
CGAATCAAAG 
TGTGGCTCTG 
TTACAAGCCG 
GTTGTTTACA 
AAAGGAATGC 
AGGAGCATAA 
TTCAAAATTC 
CAOGTACACA 
GAGAAAGGCG 
GGCAGAATGT 
TCTTTAGGAC 
ACACTAGCAA 
CCGCATCTCA 
TTCCAACCTT 
GGGAAAATCA 
GAAGCAAATA 
GCAGrTCACAC 
AACAAAAGGT 
AATTCCAAAA 
TAATTATTAC 
ACGTAGGTGT 



3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 
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ATTGOTCAAC AGTCCCAAAG GGATGSAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGCCTC ATACCAAAAA TAGAAGACTC TAACTCTTGT GGTGACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGGATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATCCAA TGAAAACACT GATCCCAGAA CAAAACGATT 
CTTTGGAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 
GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 
AATTAGGGAC ACAAATAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAGT 
AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAGGCT 
AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 
AACAAACATA TTTGGTGATA ACATAGGATC GTTACAAGAA AAAGGAATAA AATTACAAGG 
TATAGCATCA TTATACCGCA CAAATATCAC AGAAATATTC ACAACATCAA CAGTTGATAA 
ATATGATATC TATGATCTGT TATTTACAGA ATCAATAAAG GTGAGAGTTA TAGATGTTGA 
CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 
CACTCAGATC TACAAAGTAG ATTCCATATC ATATAACATC CAAAACAGAG AATGGTATAT 
CCCTCTTCCC AGCCATATCA TGACGAAAGG GGCATTTCTA GGTGGAGCAG ACGTCAAAGA 
ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGGTCACATC 
AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAGT AGGTATCAAC GGAATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTAC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 



5220 

5260 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6300 

6360 

6420 

6480 

6540 

6600 

6660 

6720 
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TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 
AATTAATTCC ATCAAAAGTG AAAAGGCCCA COAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 
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ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAGTA 
ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 
GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 
ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 
CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 
TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 
AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 
TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 
GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 
TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAX AATATATCGA 
AAGTTCACAC AAC CT AT AAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 
TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAAGAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATtfTTCT 
CTTTT TT TAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
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TCTTCTGTAC AATAATAATT AACGGATATA GAGAGACGCA TGGTGGACAG TOGCCTCCTG 9900 

TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCOA 9960 

TATCATATGA GAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 10020 

TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 10080 

CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 10140 

CATCCAACGA ATCACGAAOA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 10200 

ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 10260 

CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 10320 

ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 10380 

TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 10440 

TATCAATATC AGGAGTTCCA CGGTATAATG AAOTGTACAA TAATTCTAAA AGCCATACAG 10500 

ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 10560 

AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 10620 

TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 10680 

TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGQ TTACACCCTC 10740 

GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 10800 

ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 10860 

TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 10920 

CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 10980 

TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 11040 

ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TOATCTAGGT CATGAACTTA 11100 

AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 11160 

ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 11220 

CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 11280 

TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 11340 

AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 11400 
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AGTATTTTAO GAATCCAAAT TGOATQCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 11460 
GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 11520 
CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 11580 
ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAGATCCAT 11640 
ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 11700 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 11760 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG OAAGGTAATT CTCCCTAGAG 11820 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 11880 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 11940 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 12000 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 12060 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 12120 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 12180 

TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 12240 

AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAOAGT TCCTTATTTT GGATCAGTCA 12300 

CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 12360 

CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 12420 

TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 12480 

TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT TAAGGATACT GCAACTCAGA 12540 

TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 12600 

ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 12660 

TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 12720 

ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 12780 

ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 12840 

TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 12900 

ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 12960 
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CAATATGTAC TGCAATTACA ATAGCAGATA CTATOTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTOCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATATTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GC C TTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT 'AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTtCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 
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TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 14580 
TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 14640 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 14700 

CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 14760 

GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC 14820 

ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA TATCAACAAT ATAATCCAAA 14880 

GTTTTCAGCG AACAATAAAO GATGTTTTAT TTGAATGOAT TAATATAACT CATGATGATA 14940 

AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT GAAAAATAAG GGAAAGTTAA 15000 

GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT ATCATTATCG ACTCGATTAC 15060 

TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC ACAGACTGGA TATGTATCAT 15120 

TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA AAACATCATT AAGAATTACA 15180 

GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA AGAAGTTAAA ATACTTATGA 15240 

AATTGATTOG TGGTGCTAAA TTATTAGGAA TTCCCAGACA ATATAAAGAA CCCGAAGACC 15300 

AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT CGATTAAAAC ATAAATACAA 15360 

TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA GACAAAAAGT AAGAAAAACA 15420 

TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG GT 15462 
(2) INFORMATION FOR SEQ ID NO: 22: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp lie Leu Tyr Pro 

1 5 10 is 

Glu Cys His Leu Asn Ser Pro He Val Lys Gly Lye He Ala Gin Leu 
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20 25 30 

His Thr lie Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

lie Leu Val lie Thr Arg Qln Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Qln Arg Ser He Arg Arg Leu Lys Leu He Leu Thr Qlu Lys Val 
65 70 75 BO 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Qlu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Qlu 
100 105 110 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Qlu Qlu He Asn Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Qlu 
180 185 190 

He Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Qlu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 
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Val Lye Oln Leu Arg Oly Ala Phe Leu Asn Hia Val Lau Sar Qlu Mat 
305 310 sis 320 

Glu Lau Ila Phe Qlu Sar Arg Qlu Sar Ila Lya Glu Pha Leu Sar Val 
325 330 335 

Aap Tyr lie Asp Lya lie Leu Asp lie Phe Asn Lys Ser Thr lie Asp 
34< > 345 350 

Glu Ila Ala Qlu lie Pha Ser Phe Phe Arg Thr Pha Gly His Pro Pro 
35 5 360 365 

Leu Glu Ala Ser lie Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr Ila 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 3 *0 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
42 ° 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lye 
465 4 ™ 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Qlu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Sar Glu Thr Leu Leu Ala Asn Aen He 
565 570 575 
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Gly Lye Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu I la Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr lie Ser He Ser Gly Val Pro Arg Tyr Aan 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 

Asn Lys He Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin He Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly He Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
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850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
86S «70 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
300 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 
3*0 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 350 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Phe 
380 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
395 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
10« 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
"60 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly Ho Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
10 *0 1095 iioo 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 IHO 1115 H20 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 H35 
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Ala He Ala Leu Arg Gin Lye Mat Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Net He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 

Val Val He Thr Gly Ser Glu His Cyo Lys He Cys Tyr Ser Ser Asp 
1170 1175 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn He Lys He 
1185 1190 1195 1200 

Gly Ser Ala Glu Thr Gly He Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr He Lya Aan 
1220 1225 1230 

Leu Ser Lys Pro Ala Lya Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Phe Lys Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lya Pho Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 
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Val Asp Leu Ser Lys Leu Met VaX lie Lys Asp His Ser Tyr Thr lie 
1410 1415 1420 

Asp Net Asn Tyr Trp Asp Asp Thr Asp lie He His Ala lie Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 isoo 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
1505 1510 1515 1520 

Arg Asp Thr Ser His Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 1550 

He Tyr Gly Pro Asn He Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 

Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 isao 

Gly Val Ser Leu Glu He Tyr He Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 I630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 



Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He 



Asp Asp Trp 
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1685 1690 1695 

Aap Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Qly Asn Lys He Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys He Arg Ser He 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly He Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn He Thr Asp Val lie Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
1845 1850 1855 

Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp He Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 

Leu lie Trp Ser Glu Leu Asn Asp Lys Ser He Gly Leu Val His Cys 
1890 1695 1900 

Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 1920 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 
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Lou Lys Thr Ser Asn Pro Ala Sor Thr Glu Leu Tyr Leu lie Ser Lys 
"70 1975 1980 

Asp Ala Tyr Cya Thr lie Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 "90 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 
2005 2010 2015 

He Leu Ser Lys Lys Arg Asn Asn Glu Trp Leu His His Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 

Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lye 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 
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(2) INFORMATION FOR SBQ ID NO: 23 1 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15218 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGAAAA AAATGGGGCA AATAAGAACT 
TGATAAGTGC TATTTAAGTC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 
TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 
CATGTTATAC TGATAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GCAATACATA 
CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 
ATAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATACTACAA AATGGAGGAT 
ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATT AAACGGTTTA ATGGATGATA 
ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 
AAATATCTGA CTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTCAAT 
AGACATGTGT TTATTACCAT TTTAGTTAAT ATAAAAACTC ATCAAAGGGA AATGGGGCAA 
ATAAACTCAC CTAATCAATC AAACCATGAG CACTACAAAT GACAACACTA CTATGCAAAG 
ATTGATGATC ACAGACATGA GACCCCTGTC AATGGATTCA ATAATAACAT CTCTTACCAA 
AGAAATCATC ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 
TGATGAAAGA CAAGCTACAT TTACATTCTT AGTCAATTAT GAGATGAAGC TACTGCACAA 
AGTAGGGAGT ACCAAATACA AAAAATACAC TGAATATAAT ACAAAATATG GCACTTTCCC 
CATGCCTATA TTTATCAATC ACGGCGGGTT TCTAGAATGT ATTGGCATTA AGCCTACAAA 
ACACACTCCT ATAATATACA AATATGACCT CAACCCGTGA ATTCCAACAA AAAAACCAAC 
CCAACCAAAC CAAACTATTC CTCAAACAAC AGTGCTCAAT AGTTAAGAAG GAGCTAATCC 
ATTTTAGTAA TTAAAAATAA AAGTAAAGCC AATAACATAA ATTGGGGCAA ATACAAAGAT 



60 
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GGCTCTTAGC AAAGTCAAGT TGAATGATAC ATTAAATAAG GATCAGCTGC TGTCATCCAG 1200 

CAAATACACT ATTCAACGTA GTACAGGAGA TAATATTGAC ACTCCCAATT ATGATGTGCA 1260 

AAAACACCTA AACAAACTAT GTGGTATGCT ATTAATCACT GAAGATGCAA ATCATAAATT 1320 

CACAGGATTA ATAGGTATGT TATATGCTAT GTCCAGGTTA GGAAGGGAAG ACACTATAAA 1380 

OATACTTAAA GATGCTGGAT ATCATGTTAA AGCTAATGGA GTAGATATAA CAACATATCG 1440 

TCAAGATATA AATGGAAAGG AAATGAAATT CGAAGTATTA ACATTATCAA GCTTGACATC 1500 

AGAAATACAA GTCAATATTG AGATAGAATC TAGAAAGTCC TACAAAAAAA TGCTAAAAGA 1560 

GATGGGAGAA GTGGCTCCAG AATATAGGCA TGATTCTCCA GACTGTGGGA TGATAATACT 1620 

GTGTATAGCT GCACTTGIGA TAACCAAATT AGCAGCAGGA GACAGATCAG GTCTTACAGC 1680 

AGTAATTAGG AGGGCAAACA ATGTCTTAAA AAACGAAATA AAACGATACA AGGOCCTCAT 1740 

ACCAAAGGAT ATAGCTAACA GTTTTTATGA AGTGTTTGAA AAACACCCTC ATCTTATAGA 1800 

TGTTTTCGTG CACTTTGGCA TTGCACAATC ATCCACAAGA GGGGGTAGTA GAGTTGAAGG 1860 

AATCTTTGCA GGATTGTTTA TGAATGCCTA TGGTTCAGGG CAAGTAATGC TAAGATGGGG 1920 

AGTTTTAGCC AAATCTGTAA AAAATATCAT GCTAGGACAT GCTAGTGTCC AGGCAGAAAT 1980 

GGAGCAAGTT GTGGAAGTCT ATGAGTATGC ACAGAAGTTG GGAGGAGAAG CTGGATTCTA 2040 

CCATATATTG AACAATCCAA AAGCATCATT GCTGTCATTA ACTCAATTTC CCAACTTCTC 2100 

AAGTGTGGTC CTAGGCAATG CAGCAGGTCT AGGCATAATG GGAGAGTATA GAGGTACACC 2160 

AAGAAACCAG GATCTTTATG ATGCAGCTAA AGCATATGCA GAGCAACTCA AAOAAAATGG 2220 

AGTAATAAAC TACAGTGTAT TAGACTTAAC AGCAGAAGAA TTGGAAGCCA TAAAGCATCA 2280 

ACTCAACCCC AAAGAAGATG ATGTAGAGCT TTAAGTTAAC AAAAAATACG GGGCAAATAA 2340 

GTCAACATGG AGAAGTTTGC ACCTGAATTT CATGGAGAAG ATGCAAATAA CAAAGCTACC 2400 

AAATTCCTAG AATCAATAAA GGGCAAGTTC GCATCATCCA AAGATCCTAA GAAGAAAGAT 2460 

AGCATAATAT CTGTTAACTC AATAGATATA GAAGTAACTA AAGAGAGCCC GATAACATCT 2520 

GGCACCAACA TCATCAATCC AACAAGTGAA GCCGACAGTA CCCCAGAAAC AAAAGCCAAC 2580 

TACCCAAGAA AACCCCTAGT AAGCTTCAAA GAAGATCTCA CCCCAAGTGA CAACCCTTTT 2640 

TCTAAGTTGT ACAAGGAAAC AATAGAAACA TTTGATAACA ATGAAGAAGA ATCTAGCTAC 2700 
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TCATATGAAG AGATAAATGA TCAAACAAAT GACAACATTA CAGCAAGACT AGATAGAATT 2760 

GATGAAAAAT TAAGTGAAAT ATTAGGAATG CTCCATACAT TAGTAGTTGC AAGTGCAGGA 2820 

CCCACTTCAG CTCGCOATGG AATAAGAGAT GCTATGGTTG GTCTAAGAGA AGAGATGATA 2880 

GAAAAAATAA GAGCGGAAGC ATTAATGACC AATGATAGGT TAGAGGCTAT GGCAAGACTT 2940 

AGGAATGAGG AAAGCGAAAA AATGGCAAAA GACACCTCAG ATGAAGTGTC TCTTAATCCA 3000 

ACTTCCAAAA AATTGAGTGA CTTGTTGGAA GACAACGATA GTGACAATGA TCTATCACTT 3060 

GATGATTTTT GATCAGCGAT CAACTCACTC AGCAATCAAC AACATCAATA AAACAGACAT 3120 

CAATCCATTG AATCAACTG C CAGACCGAAC AAACAAACGT CCATCAGTAG AACCACCAAC 3180 

CAATCAATCA ACCAATTGAT CAATCAGCAA CCCGACAAAA TTAACAATAT AGTAACAAAA 3240 

AAAGAACAAG ATGGGGCAAA TATGGAAACA TACGTGAACA AGCTTCACGA AGGCTCCACA 3300 

TACACAGCAG CTGTTCAGTA CAATGTTCTA GAAAAAGATG ATGATCCTGC ATCACTAACA 3360 

ATATGGGTGC CTATGTTCCA GTCATCTGTG CCAGCAGACT TGCTCATAAA AGAACTTGCA 3420 

AGCATCAATA TACTAGTGAA GGAGATCTCT ACGCCCAAAG GACCTTCACT ACGAGTCACG 3480 

ATTAACTCAA GAAGTGCTGT GCTGGCTCAA ATGCCTAGTA ATTTCATCAT AAGCGCAAAT 3540 

GTATCATTAG ATGAAAGAAG CAAATTAGCA TATGATGTAA CTACACCTTG TGAAATCAAA 3600 

GCATGCAGTC TAACATGCTT AAAAGTAAAA AGTATGTTAA CTACAGTCAA AGATCTTACC 3 660 

ATGAAGACAT TCAACCCCAC TCATGAGATC ATTGCTCTAT GTGAATTTGA AAATATTATG 3720 

ACATCAAAAA GAGTAATAAT ACCAACCTAT CTAAGATCAA TTAGTGTCAA GAACAAGGAT 3780 

CTGAACTCAC TAGAAAATAT AGCAACCACC GAATTCAAAA ATGCTATCAC CAATGCAAAA 3840 

ATTATTCCTT ATGCAGGATT AGTGTTAGTT ATCACAGTTA CTGACAATAA AGGAGCATTC 3900 

AAATATATCA AACCACAGAG TCAATTTATA GTAGATCTTG GTGCCTACCT AGAAAAAGAG 3960 

AGCATATATT ATGTGACTAC TAATTGGAAG CATACAGCTA CACGTTTTTC AATCAAACCA 4020 

CTAGAGGATT AAACTTAATT ATCAACACTG AATGACAGGT CCACATATAT CCTCAAACTA 4080 

CACACTATAT CCAAACATCA TAAACATCTA CACTACACAC TTCATCACAC AAACCAATCC 4140 

CACTCAAAAT C CAAAAT C AC TACCAGCCAC TATCTGCTAG ACCTAGAGTG CGAATAGGTA 4200 

AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTTAACAACC 4260 
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ATTTATACCG CCAATTCAAC ACATATACTA TAAATCTTAA AATGGGAAAT ACATCCATCA 4320 

CAATAGAATT CACAAGCAAA TTTTGGCCCT ATTTTACACT AATACATATG ATCTTAACTC 4380 

TAATCTTTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 4440 

ATAAAGCATT CTGTAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAGA 4500 

GTTCTACCAT TATGCTGTGT CAAATTATAA TCCTGTATAT ATAAACAAAC AAATCCAATC 4560 

TTCTCACAGA GTCATGGTGT CGCAAAACCA CGCTAACTAT CATGGTAGCA TAGAGTAGTT 4620 

ATTTAAAAAT TAACATAATG ATGAATTGTT AGTATGAGAT CAAAAACAAC ATTGGGGCAA 4680 

ATGCAACCAT GTCCAAACAC AAGAATCAAC GCACTGCCAG GACTCTAGAA AAGACCTGGG 4740 

ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 4 BOO 

TAGCACAAAT AGCACTATCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 4860 

CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTC ACAGTTCAAA 4920 

CAATAAAAAA CCACACTGAA AAAAACATCA CCACCTACCC TACTCAAGTC TCACCAGAAA 4980 

GGGTTAGTTC ATCCAAGCAA CCCACAACCA CATCACCAAT CCACACAAGT TCAGCTACAA 5040 

CATCACCCAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAACCA 5100 

CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAACCACG TCCAAAAAAT CCACCAAAAA 5160 

AAGATGATTA CCATTTTGAA GTGTTCAACT TCGTTCCCTG CAGTATATGT GGCAACAATC 5220 

AACTTTGCAA ATCCATCTGC AAAACAATAC CAAGCAACAA ACCAAAGAAG AAACCAACCA 5280 

TCAAACCCAC AAACAAACCA ACCACCAAAA CCACAAACAA AAGAGACCCA AAAACACCAG 5340 

CCAAAACGAC GAAAAAAGAA ACTACCACCA ACCCAACAAA AAAACTAACC CTCAAGACCA 5400 

CAGAAAGAGA CACCAGCACC TCACAATCCA CTGCACTCGA CACAACCACA TTAAAACACA 5460 

CAGTCCAACA GCAATCCCTC CTCTCAACCA CCCCCGAAAA CACACCCAAC TCCACACAAA 5520 

CACCCACAGC ATCCGAGCCC TCCACACCAA ACTCCACCCA AAAAACCCAG CCACATGCTT 5580 

AGTTATTCAA AAACTACATC TTAGCAGAGA ACCGTGATCT ATCAAGCAAG AACGAAATTA 5640 

AACCTGGGGC AAATAACCAT GGAGTTGATG ATCCACAAGT CAAGTGCAAT CTTCCTAACT 5700 

CTTGCTATTA ATGCATTGTA CCTCACCTCA AGTCAGAACA TAACTGAGGA GTTTTACCAA 5760 

TCGACATGTA GTGCAGTTAG CAGAGGTTAT TTTAGTGCTT TAAGAACAGG TTGGTATACT 5820 
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AGTGTCATAA CAATAGAATT AAGTAATATA AAAGAAACCA AATGCAATGG AACTGACACT 
AAAGTAAAAC TTATGAAACA AGAATTAGAT AAGTATAAGA ATGCAGTAAC AGAATTACAG 
CTACTTATGC AAAACACACC AGCTGTCAAC AACCGGGCCA GAAGAGAAGC ACCACAGTAT 
ATGAACTACA CAATCAATAC CACTAAAAAC CTAAATGTAT CAATAAGCAA GAAGAGGAAA 
CGAAGATTTC TAGGCTTCTT GTTAGGTOTG GGATCTGCAA TAGCAAGTGG TATAGCTGTA 
TCAAAAGTTC TACACCTTGA AGGAGAAGTG AACAAGATCA AAAATGCTTT GTTGTCTACA 
AACAAAGCTG TAGTCAGTTT ATCAAATGGG GTCAGTGTTT TAACCAGCAA AGTGTTAGAT 
CTCAAGAATT ACATAAATAA CCAATTATTA CCCATAGTAA ATCAACAGAG CTGTCGCATC 
TCCAACATTG AAACAGTTAT AGAATTCCAG CAGAAGAACA GCAGATTGTT GGAAATCACC 
AGAGAATTTA GTGTCAATGC AGGTGTAACA ACACCTTTAA GCACTTACAT GTTGACAAAC 
AGTGAGTTAC TATCATTAAT CAATGATATG CCTATAACAA ATGATCAGAA AAAATTAATG 
TCAAGCAATG TTCAGATAGT AAGGCAACAA AGTTATTCCA TCATGTCTAT AATAAAGGAA 
GAAGTCCTTG CATATGTTGT ACAGCTGCCT ATCTATGGTG TAATAGATAC ACCTTGCTGG 
AAATTGCACA CATCGCCTCT ATGCACTACC AACATCAAAG AAGGATCAAA TATTTGTTTA 
ACAAGGACTG ATAGAGGA7G GTATTGTGAT AATGCAGGAT CAGTATCCTT CTTTCCACAG 
GCTGACACTT GTAAAGTACA GTCCAATCGA GTATTTTGTG ACACTATGAA CAGTTTGACA 
TTACCAAGTG AAGTCAGCCT TTGTAACACT GACATATTCA ATTCCAAGTA TGACTGCAAA 
ATTATGACAT CAAAAACAGA CATAAGCAGC TCAGTAATTA CTTCTCTTGG AGCTATAGTG 
TCATGCTATG GTAAAACTAA ATGCACTGCA TCCAACAAAA ATCGTGGGAT TATAAAGACA 
TTTTCTAATG GTTGTGACTA TGTGTCAAAC AAAGGAGTAG ATACTGTGTC AGTGGGCAAC 
ACTTTATACT AT GT AAA CAA GCTGGAAGGC AAGAACCTTT ATGTAAAAGG GGAACCTATA 
ATAAATTACT ATGACCCTCT AGTGTTTCCT TCTGATGAGT TTGATGCATC AATATCTCAA 
GTCAATGAAA AAATCAATCA AAGTTTAGCT TTTATTCGTA GATCTGATGA ATTACTACAT 
AATGTAAATA CTGGCAAATC TACTACAAAT ATTATGATAA CTACAATTAT TATAGTAATC 
ATTGTAGTAT TGTTATCATT AAT AG CT ATT GGTTTACTGT TGTATTOTAA AGCCAAAAAC 
ACACCAGTTA CACTAAGCAA AGACCAACTA AGTGGAATCA ATAATATTGC ATTCAGCAAA 
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TAGACAAAAA ACCACCTGAT CATGTTTCAA CAACAATCTG CTGACCACCA ATCCCAAATC 7440 

AACTTACAAC AAATATTTCA ACATCACAGT ACAGGCTGAA TCATTTCCTC ACATCATGCT 7500 

ACCCACATAA CTAAGCTAGA TCCTTAACTT ATAGTTACAT AAAAACCTCA AGTATCACAA 7560 

TCAACCACTA AATCAACACA TCATTCACAA AATTAACAGC TGGGGCAAAT ATGTCGCGAA 7620 

GAAATCCTTG TAAATTTGAG ATTAGAOGTC ATTGCTTGAA TGGTAGAAGA TGTCACTACA 7680 

GTCATAATTA CTTTGAATGG CCTCCTCATG CATTACTAGT GAGGCAAAAC TTCATGTTAA 7740 

ACAAGATACT CAAGTCAATG GACAAAAGCA TAGACACTTT GTCTGAAATA AGTGGAGCTG 7 800 

CTGAACTGGA TAGAACAGAA GAATATGCTC TTGGTATAGT TGGAGTGCTA GAGAGTTACA 7860 

TAGGATCTAT AAACAACATA ACAAAACAAT CAGCATGTGT TGCTATGAGT AAACTTCTTA 7920 

TTGAGATCAA TAGTGATGAC ATTAAAAAGC TTAGAGATAA TGAAGAACCC AATTCACCTA 7 980 

AGATAAGAGT GTACAATACT GTTATATCAT ACATTGAGAG CAATAGAAAA AACAACAAGC 8040 

AAACCATCCA TCTGCTCAAG AGACTACCAG CAGACGTGCT GAAGAAGACA ATAAAGAACA 8100 

CATTAGATAT CCACAAAAGC ATAACCATAA GCAATCCAAA AGAGTCAACT GTGAATGATC 8160 

AAAATGACCA AACCAAAAAT AATGATATTA CCGGATAAAT ATCCTTGTAG TATATCATCC 8220 

ATATTGATCT CAAGTGAAAG CATGGTTGCT ACATTCAATC ATAAAAACAT ATTACAATTT 8280 

AACCATAACT ATTTGGATAA CCACCAGCGT TTATTAAATC ATATATTTGA TGAAATTCAT 8340 

TGGACACCTA AAAACTTATT AGATGCCACT CAACAATTTC TCCAACATCT TAACATCCCT 8400 

OAAGATATAT ATACAGTATA TATATTAGTG TCATAATGCT TGACCATAAC GACTCTATGT 8460 

CATCCAACCA TAAAACTATT TTGATAAGGT TATGGGACAA AATGGATCCC ATTATTAATG 8520 

GAAACTCTGC TAATGTGTAT CTAACTGATA GTTATTTAAA AGGTGTTATC TCTTTTTCAG 8580 

AGTGTAATGC TTTAGGGAGT TATCTTTTTA ACGGCCCTTA TCTTAAAAAT GATTACACCA 6640 

ACTTAATTAG TAGACAAAGC CCACTACTAG AGCATATGAA TCTTAAAAAA CTAACTATAA 8700 

CACAGTCATT AATATCTAGA TATCATAAAG GTGAACTGAA ATTAGAAGAA CCAACTTATT 8760 

TCCAGTCATT ACTTATGACA TATAAAAGTA TGTCCTCGTC TGAACAAATT GCTACAACTA 8820 

ACTTACTTAA AAAAATAATA CGAAGAGCCA TAGAAATAAG TGATGTAAAG GTGTACGCCA 8880 

TCTTGAATAA ACTAGGATTA AAGGAAAAGG ACAGAGTTAA GCCCAACAAT AATTCAGGTG 8940 
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ATGAAAACTC AGTACTTACA ACCATAATTA AAGATOATAT ACTTTCGGCT GTGGAAAACA 
ATCAATCATA TACAAATTCA GACAAAAGTC ACTCAOTAAA TCAAAATATC ACTATCAAAA 
CAACACTCTT QAAAAAATTG ATGTGTTCAA TGCAACATCC TCCATCATGG TTAATACACT 
GGTTCAATTT ATATACAAAA TTAAATAACA TATTAACACA ATATCGATCA AATGAGGTAA 
AAAGTCATGG GTTTATATTA ATAGATAATC AAACTTTAAG TGGTTTTCAG TTTATTTTAA 
AT CAATATGG TTGTATCGTT TATCATAAAG GACTCAAAAA AATCACAACT ACTACTTACA 
ATCAATTTTT GACATGGAAA GACATCAGCC TTAGCAGATT AAATGTTTGC TTAATTACTT 
GGATAAGTAA TTGTTTAAAT ACATTAAACA AAAGCTTAGG GCTGAGATGT GGATTCAATA 
ATGTTGTGTT ATCACAATTA TTTCTTTATG GAGATTGTAT ACTGAAATTA TTTCATAATG 
AAGGCTTCTA CATAATAAAA GAAGTAGAGG GATTTATTAT GTCTTTAATT C T AAA CAT AA 
CAGAAGAAGA TCAATTTAGG AAACGATTTT AT AAT AG CAT GCTAAATAAC ATCACAGATG 
CAGCTATTAA GGCTCAAAAG GACCTACTAT CAAGAGTATG TCACACTTTA TTAGACAAGA 
CAGTGTCTGA T AAT AT CAT A AATGGTAAAT GGATAATCCT ATTAAGTAAA TTTCTTAAAT 
TGATTAAGCT TGCAGGTGAT AATAATCTCA ATAACTTGAG TGAGCTATAT TTTCTCTTCA 
GAATCTTTGG ACATCCAATG GTCGATGAAA GACAAGCAAT GGATTCTGTA AGAATTAACT 
GTAATGAAAC TAAGTTCTAC TTATTAAGTA GTCTAAGTAC ATTAAGAGGT GCTTTCATTT 
ATAGAATCAT AAAAGGGTTT GTAAATACCT ACAACAGATG GCCCACCTTA AG G AAT OCT A 
TTGTCCTACC TCTAAGATGG TTAAACTACT ATAAACTTAA TACTTATCCA TCTCTACTTG 
AAATCACAGA AAATGATTTG ATTATTTTAT CAGGATTGCG GTTCTATCGT GAGTTTCATC 
TGCCTAAAAA AGTGGATCTT GAAATGATAA TAAATGACAA AGCCATTTCA CCTCCAAAAG 
ATCTAATATG GACTAGTTTT CCTAGAAATT ACATGCCATC ACATATACAA AATTATATAG 
AACATGAAAA GTTGAAGTTC TCTGAAAGCG ACAGATCGAG AAGAGTACTA GAGTATTACT 
TGAGAGATAA TAAATTCAAT GAATGCGATC TATACAATTG TGTAGTCAAT CAAAGCTATC 
TCAACAACTC TAATCACGTG GTATCACTAA CTGGTAAAGA AAGAGAGCTC AGTGTAGGTA 
GAATGTTTGC TATGCAACCA GGTATGTTTA GGCAAATCCA AATCTTAGCA GAGAAAATGA 
TAGCTGAAAA TATTTTACAA TTCTTCCCTG AGAGTTTGAC AAGATATGGT GATCTAGAGC 



9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCTYUS97/16718 



- 299 



TTCAAAAGAT ATTAGAATTA AAA6CAG6AA TAAGCAACAA GTCAAATCQT TATAATGATA 10560 

ACTACAACAA TTATATCAGT AAATGTTCTA TCATTACAGA TCTTAGCAAA TTCAATCAGG 10620 

CATTTAGATA TGAAACATCA TGTATCTGCA GTGATGTATT AGATGAACTG CATGGAGTAC 10680 

AATCTCTGTT CTCTTGGTTG CATTTAACAA TACCTCTTGT CACAATAATA TGTACATATA 10740 

GACATGCACC TCCTTTCATA AAGGATCATG TTGTTAATCT TAATGAGGTT GATGAACAAA 10800 

GTGGATTATA CAGATATCAT ATGGGTGGTA TTGAGGGCTG GTGTCAAAAA CTGTGGACCA 10860 

TTGAAGCTAT ATCATTATTA GATCTAATAT CTCTCAAAGG GAAATTCTCT ATCACAGCTC 10920 

TGATAAATGG TGATAATCAG T CAATT G AT A TAAGCAAACC AGTTAGACTT ATAGAGGGTC 10980 

AGACCCATGC ACAAGCAGAT TATTTGTTAG CATTAAATAG CCTTAAATTG TTATATAAAG 11040 

AGTATGCAGG TATAGGCCAT AAGCTTAAGG GAACAGAGAC CTATATATCC CGAGATATGC 11100 

AGTTCATGAG CAAAACAATC CAGCACAATG GAGTGTACTA TCCAGCCAGT ATCAAAAAAG 11160 

TCCTGAGAGT AGGTCCATGG ATAAACACGA TACTTGATGA TTTTAAAGTT AGTTTAGAAT 11220 

CTATAGGCAG CTTAACACAG GAGTTAGAAT ACAGAGGAGA AAGCTTATTA TGCAGTTTAA 11280 

TATTTAGGAA CATTTGGTTA TACAATCAAA TTGCTTTGCA ACTCCGAAAT CATGCATTAT 11340 

GTAACAATAA GCTATATTTA GATATATTGA AAGTATTAAA ACACTTAAAA ACTTTTTTTA 11400 

ATCTTGATAG CATTGATATG GCTTTATCAT TGTATATGAA TTTGCCTATG CTGTTTGGTG 11460 

GTGGTGATCC TAATTTGTTA TATCGAAGCT TTTATAGGAG AACTCCAGAC TTCCTTACAG 11520 

AAGCTATAGT ACATTCAGTG TTTGTGTTGA GCTATTATAC TGGTCACGAT TTACAAGATA 11580 

AGCTCCAGGA TCTTCCAGAT GATAGACTGA ACAAATTCTT GACATGTGTC ATCACATTTG 11640 

ATAAAAATCC CAATGCCGAG TTTGTAACAT TGATGAGGGA TCCACAGGCT TTAGGGTCTG 11700 

AAAGGCAAGC TAAAATTACT AGTGAGATTA ATAGATTAGC AGTAACAGAA GTCTTAAGTA 11760 

TAGCCCCAAA CAAAATATTT TCTAAAAGTG CACAACATTA TACTACCACT GAGATTGATC 11820 

TAAATGACAT TATGCAAAAT ATAGAACCAA CTTACCCTCA TGGATTAAGA GTTGTTTATG 11880 

AAAGTTTACC TTTTTATAAA GCAGAAAAAA TAGTTAATCT TATATCAGGA ACAAAATCCA 11940 

TAACTAATAT ACTTGAAAAA ACATCAGCAA TAGATACAAC TGATATTAAT AGGGCTACTG 12 000 

ATATGATGAG GAAAAATATA ACTTTACTTA TAAGGATACT TCCACTAGAT TGTAACAAAG 12060 
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ACAAAAGAGA GTTATTAAGT TTAGAAAATC TTAGTATAAC TGAATTAAGC AAGTATGTAA 
GAGAAAGATC TTGGTCATTA TCCAATATAG TAGGAGTAAC ATCGCCAAGT ATTATGTTCA 
CAATGGACAT TAAATATACA ACTAGCACTA TAGCCAGTGG TATAATAATA GAAAAATATA 
ATGTTAATAG TTTAACTCGT GGTGAAAGAG GACCCACCAA GCCATGGGTA GGCTCATCCA 
CGCAGGAGAA AAAAACAATG CCAGTGTACA ACAGACAAGT TTTAACCAAA AAGCAAAGAG 
ACCAAATAGA TTTATTAGCA AAATTAGACT GGGTATATGC ATCCATAGAC AACAAAGATG 
AATTCATGGA AGAACTGAGT ACTGGAACAC TTGGACTGTC ATATGAAAAA GCCAAAAAGT 
TGTTTCCACA ATATCTAAGT GTCAATTATT TACACCGTTT AACAGTCAGT AGTAGACCAT 
GTGAATTCCC TGCATCAATA CCAGCTTATA GAACAACAAA TTATCATTTT GATACTAGTC 
CTATCAATCA TGTATTAACA GAAAAGTATG GAGATGAAGA TATCGACATT GTGTTTCAAA 
ATTGCATAAG TTTTGGTCTT AGCCTGATGT CGGTTGTGGA ACAATTCACA AACATATGTC 
CTAATAGAAT TATTCTCATA CCGAAGCTGA ATGAGATACA TTTGATGAAA CCTCCTATAT 
TTACAGGAGA TGTTGATATC ATCAAGTTGA AGCAAGTGAT ACAAAAGCAG CACATGTTCC 
TACCAGATAA AATAAGTTTA ACCCAATATG TAGAATTATT CTTAAGTAAC AAAGCACTTA 
AATCTGGATC TCACATCAAC TCTAATTTAA TATTAGTACA TAAAATGTCT GATTATTTTC 
ATAATGCTTA TATTTTAAGT ACTAATTTAG CTGGACATTG GATTCTGATT ATTCAACTTA 
TGAAAGATTC AAAAGGTATT TTTGAAAAAG ATTGGGGAGA GGGGTACATA ACTGATCATA 
TGTTCATTAA TTTGAATGTT TTCTTTAATG CTTATAAGAC TTATTTGCTA TGTTTTCATA 
AAGGTTATGG TAAAGCAAAA TTAGAATGTG ATATGAACAC TTCAGATCTT CTTTGTGTTT 
TGGAGTTAAT AGACAGTAGC TACTGGAAAT CTATGTCTAA AGTTTTCCTA GAACAAAAAG 
TCATAAAATA CATAGTCAAT CAAGACACAA GTTTGCGTAG AATAAAAGGC TGTCACAGTT 
TTAAGTTGTG GTTTTTAAAA CGCCTTAATA ATGCTAAATT TACCGTATGC CCTTGGGTTG 
TTAACATAGA TTATCACCCA ACACACATGA AAGCTATATT ATCTTACATA GATTTAGTTA 
GAATGGGGTT AATAAATGTA GATAAATTAA CCATTAAAAA TAAAAACAAA TTCAATGATG 
AATTTTACAC ATCAAATCTC TTTTACATTA GTTATAACTT TTCAGACAAC ACTCATTTGC 
TAACAAAACA AATAAGAATT GCTAATTCAG AATTAGAAGA TAATTATAAC AAACTATATC 



12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12840 
12900 
12960 
13020 
13080 
13140 
13200 
13260 
13320 
13380 
13440 
13500 
13560 
13620 
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ACCCAACCCC AGAAACTTTA QAAAATATGT CATTAATTCC TGTTAAAAGT AATAATAGTA 13680 
ACAAACCTAA ATTTTGTATA AGTGGAAATA CCGAATCTAT GATGATGTCA ACATTCTCTA 13740 
GTAAAATGCA TATTAAATCT TCCACTGTTA CCACAAGATT CAATTATAGC AAACAAGACT 13800 
TGTACAATTT ATTTCCAATT GTTGTGATAG ACAAGATTAT AGATCATTCA GOTAATACAG 13860 
CAAAATCTAA CCAACTTTAC ACCACCACTT CACATCAGAC ATCTTTAGTA AGGAATAGTG 13920 
CATCACTTTA TTGCATGCTT CCTTGGCATC ATGTCAATAO ATTTAACTTT GTATTTAGTT 13980 
CCACAGGATG CAAGATCAGT ATAGAGTATA TTTTAAAAGA TCTTAAGATT AAGGACCCCA 14040 
GTTGTATAGC ATTCATAGGT GAAGGAGCTG GTAACTTATT ATTACGTACG GTAGTAGAAC 14100 
TTCATCCAGA CATAAGATAC ATTTACAGAA GTTTAAAAGA TTGCAATGAT CATAGTTTAC 14160 
CTATTGAATT TCTAAGGTTA TACAACGGGC ATATAAACAT AGATTATGGT GAGAATTTAA 14220 
CCATTCCTGC TACAGATGCA ACTAATAACA TTCATTGGTC TTATTTACAT ATAAAATTTG 14280 
CAGAACCTAT TAGCATCTTT GTCTGCGATG CTGAATTACC TGTTACAGCC AATTGOAGTA 14340 
AAATTATAAT TGAATGGAGT AAGCATGTAA GAAAGTGCAA QTACTGTTCT TCTGTAAATA 14400 
GATGCATTTT AATTGCAAAA TATCATGCTC AAGATGACAT TGATTTCAAA TTAGATAACA 14460 
TTACTATATT AAAAACTTAC GTGTGCCTAG GTAGCAAGTT AAAAGGATCT GAAGTTTACT 14520 

TAATCCTTAC AATAGGCCCT GCAAATATAC TTCCTGTTTT TGATGTTGTA CAAAATGCTA 14580 

AATTGACACT TTCAAGAACT AAAAATTTCA TTATGCCTAA AAAAACTGAC AAGGAATCTA 14640 

TCGATGCAAA TATTAAAAGC TIAATACCTT TCCTTTGTTA CCCTATAACA AAAAAAGGAA 14700 

TTAAGACTTC ATTGTCAAAA TTGAAGAGTG TAGTTAATGG AGATATATTA TCATATTCTA 14760 

TAGCTGGACG TAATGAAGTA TTCAQCAACA AGCTTATAAA CCACAAGCAT ATGAATATCC 14820 

TAAAATGGCT AGATCATGTT TTAAATTTTA GATCAGCTGA ACTTAATTAC AATCATTTAT 14880 

ACATGATAGA GTCCACATAT CCTTACTTAA GTGAATTGTT AAATAGTTTA ACAACCAATG 14940 

AGCXCAAGAA GCTGATTAAA ATAACAGGTA GTGTGCTATA CAACCTTCCC AACGAACAGT 15000 

AGTTTAAAAT ATCATTAACA AGTTTGGTCA AATTTAGATG CTAACACATC ATTATATTAT 15060 

AGTTATTAAA AAATATACAA ACTTTTCAAT AATTTAGCAT ATXGATTCCA AAATTATCAT 15120 

TTTAGTCTTA AGGGGTTAAA TAAAAGTCTA AAACTAACAA TTATACATGT GCATTCACAA 15180 
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CACAACGAGA CATTAGTTTT TGACACTTTT TTTCTCOT 



15218 



(2) INFORMATION FOR SHQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Asp Pro lie lie Aon Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val lie Ser Phe Ser Glu Cys Aen Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Aan Asp Tyr Thr Asn Leu 
35 40 45 

lie Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr lie Thr Gin Ser Leu lie Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
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180 185 190 

Leu Net Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu lie Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cye Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 
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Phe lie Tyx Arg lie lie Lys Gly Phe Val Aan Thr Tyr Aan Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Ann Ala lie Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Qlu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 
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He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Qlu Val Asp Glu Gin Ser Gly 
755 760 " 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 630 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin Ho Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
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1010 



1015 



1020 



Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val lie Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu lie Asn Arg Leu Ala Val Thr Glu Val Leu Ser lie Ala 
1075 1080 1085 

Pro Asn Lys lie Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 



1285 



1290 



1295 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 307 - 



Trp Val Tyr Ala Ser lie Asp Asn Lys Asp Olu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser lie Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 136( 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp lie Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
"25 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 



1555 



1560 



1565 
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Leu lie Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val lie Lys Tyr lie Val Asn Qln Asp Thr Ser Leu Arg Arg 
1585 1590 1595 160C 

He Lys Gly Cye His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn lie Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 168( 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 176< 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 



Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 184< 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 



1795 



1800 



1805 
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1845 1850 



1855 



Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu Hie Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 "10 1915 i 92 o 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
19 «° 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 i960 1965 

He Leu lie Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
197 ° 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 "90 1995 2000 

Lys Gly ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Thr Leu Ser Arg 
2 «20 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lye Ser Leu lie Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 
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He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15229 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGGAAA AAATGGGGCA AATAAGAATT 60 

TGATAAGTGC TATTTAAATC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 120 

TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 180 

CATGTTATAC TGACAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GTAATACATA 240 

CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 300 

ACAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATATTACAA AACGGAGGAT 360 

ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATC AAATGGTCTA ATGGATGATA 420 

ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 480 

AAATATCTGA TTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTTAAT 540 

AGACATGTGT TTATCACCAT TTTAGTTAAT ATAAAACCTC ATCAAAGGGA AATGGGGCAA 600 

ATAAACTCAC CTAATCAGTC AAACCATGAG CACTACAAAT GACAACACTA CTATGCAAAG 660 

ATTGATGATC ACAGACATGA GACCCCTGTC GATGGAATCA ATAATAACAT CTCTCACCAA 720 

AGAAATCATA ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 7 80 

TGATGAAAGA CAAGCTACAT TTACATTCTT AGTCAATTAT GAGATGAAGC TATTGCACAA 840 
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AGTAGGGAGT ACCAAATACA AOAAATACAC TGAATATAAT ACAAAATATG OCACTTTCCC 
CATGCCTATA TTTATCAATC AT6ACGQGTT TCTAGAATQT ATTGGCATTA AOCCTACAAA 
ACACACTCCT ATAATATACA AATATGACCT CAACCCGTAA ATTCCAACAA AAAACTAACC 
CATCCAAACT AAGCTATTCC TCAAACAACA GTGCTCAACA GTTAAGAAGG AGCTAATCCA 
TTTTAGTAAT TAAAAATAAA GGCAGAGCCA ATAACATAAA TTGGGGCAAA TACAAAGATG 
GCTCTTAGCA AAGTCAAGTT AAATGATACA TTAAATAAGG ATCAGCTGCT GTCATCCAGC 
AAATACACTA TTCAACGTAG TACAGGAGAT AATATTGAGA CTCCCAATTA TGATGTGCAA 
AAACACCTAA ACAAACTATG TGGTATGCTA TTAATCACTG AAGATGCAAA TCATAAATTC 
ACAGGATTAA TAGGTATGTT ATATGCTATG TCCAGGTTAG GAAGGGAAGA CACTATAAAG 
ATACTTAAAG ATGCTGGATA TCATGTTAAA GCTAATGGAG TAGATATAAC AACATATCGT 
CAAGATATAA ACGGAAAGGA AATGAAATTC GAAGTATTAA CATTATCAAG CTTGACATCA 
GAAATACAAG TCAATATTGA GATAGAATCT AGAAAGTCCT ACAAAAAAAT GCTAAAAGAG 
ATGGGAGAAG TGGCTCCAGA ATATAGGCAT GATTCTCCAG ACTGTGGGAT GATAATACTG 
TGTATAGCTG CACTTGTAAT AACCAAGTTA GCAGCAGGAG ATAGATCAGG TCTTACAGCA 
GTAATTAGGA GGGCAAACAA TGTCTTAAAA AACGAAATAA AACGCTACAA GGG CCTCATA 
CCAAAGGATA TAGCTAACAG TTTTTATGAA GTGTTTGAAA AACACCCTCA TCTTATAGAT 
GTTTTTGTGC ACTTTGGCAT TGCACAATCA TCCACAAGAG GGGGTAGTAG AGTTGAAGGA 
ATCTTTGCAG GATTATTTAT GAATGCCTAT GGTTCAGGGC AAGTAATGCT AAGATGGGGA 
GTTCTAGCCA AATCTGTAAA AAA TAT CAT G CTAGGACATG CTAGTGTCCA GGCAGAAATG 
GAACAAGTTG TGGAAGTTTA TGAGTATGCA CAGAAGTTGG GAGGAGAAGC TGGATTCTAC 
CATATATTGA ACAATCCAAA AGCATCATTG CTGTCATTAA CTCAATTTCC TAACTTCTCA 
AGTGTGGTCC TAGGCAATGC AGCAGGTCTA GGCATAATGG GAGAGTATAG AGGTACACCA 
AGAAACCAAG ATCTATATGA TGCAGCCAAA GCATATGCAG AGCAACTCAA AGAAAATGGA 
GTAATAAACT ACAGTGTATT AGACTTAACA GCAGAAGAAT TGGAAGCCAT AAAGCATCAA 
CTCAACCCCA AAGAAGATGA TGTAGAGCTT TAAGTTAACA AAAAATACGG GGCAAATAAG 
TCAACATGGA GAAGTTTGCA CCTGAATTTC ATGGAGAAGA TGCAAACAAC AAAGCTACCA 



900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
16B0 
1740 
1600 
1660 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



312 



AATTCCTAGA ATCAATAAAG G6CAA0TTTG CATCATCCAA A6ATCCTAA6 AAGAAAGATA 2460 

GCATAATATC TGTTAACTCA ATAGATATAG AAGTAACTAA AGAGAGCCCG ATAACATCTG 2520 

GCACCAACAT CATCAATCCA ATAAGTGAAG CTGATAGTAC CCCAGAAGCT AAAGCCAACT 2580 

ACCCAAGAAA ACCCCTAGTA AGCTTCAAAG AAGATCTCAC CCCAAGTGAC AACCCCTTTT 2640 

CTAAGTTGTA CAAAGAAACA ATAGAAACAT TTGATAACAA TGAAGAAGAA TCTAGCTACT 2700 

CATATGAAGA AATAAATGAT CAAACAAATG ACAACATTAC AGCAAGACTA GATAGAATTG 2760 

ATGAAAAATT AAGTGAAATA TTAGGAATGC TCCATACATT AGTAGTTGCA AGTGCAGGAC 2820 

CCACCTCAGC TCGCGATGGA ATAAGAGATG CTATGGTTGG TCTAAGAGAA GAAATGATAG 2880 

AAAAAATAAG AGCGGAAGCA TTAATGACCA ATGATAGGTT AGAGGCTATG GCAAGACTTA 2940 

GGAATGAGGA AAGCGAAAAA ATGGCAAAAG ACACCTCAGA TGAAGTGTCT CTTAATCCAA 3000 

CTTCCAAAAA ATTGAGTAAT TTGTTGGAAG ACAACGATAG TGACAATGAT CTATCACTTG 3060 

ATG AT'f TT TQ ATCAGTGATC AACTCACTCA GCAATCAACA ACATCAATGA AACAGACATC 3120 

AATCCATTGA ATCAACTGCC AGACTGAACA CACAAACGTC CATCAGCAGA ACTACCAACC 3180 

AATCAATCAA CCAATTGATC AATCAGCGAC CTAACAAAAT TAACAATATA GTAACAAAAA 3240 

AAGAACAAGA TGGGGCAAAT ATGGAAACAT ACGTGAACAA GCTTCACGAG GGCTCCACAT 3300 

ACACAGCAGC TGTTCAGTAC AATGTTCTAG AAAAAGATGA TGATCCTGCA TCACTAACAA 3360 

TATGGGTGCC TATGTTCCAG TCATCTGTGC CAGCAGACTT GCTCATAAAA GAACTTGCAA 3420 

GCATCAACAT ACTAGTGAAG CAGATCTCCA CGCCCAAAGG ACCTTCACTA CGAGTCACGA 3480 

TTAACTCAAG AAGTGCTGTG CTGGCACAAA TGCCTAGTAG TTTTATCATA AGTGCAAATG 3540 

TATCATTAGA TGAAAGAAGC AAATTAGCAT ATGATGTAAC TACACCTTGT GAAATCAAAG 3600 

CATGCAGTCT AACATGCTTA AAAGTAAAAA GTATGTTAAC TACAGTCAAA GATCTTACCA 3660 

TGAAAACATT CAATCCCACT CATGAGATTA TTGCTCTATG TGAATTTGAA AATATTATGA 3720 

CATCAAAAAG AGTAATAATA CCAACCTATC TAAGATCAAT TAGTGTCAAA AACAAGGACC 3780 

TGAACTCACT AGAAAATATA GCAACCACCG AATTCAAAAA TGCTATCACC AATGCGAAAA 3840 

TTATTCCCTA TGCAGGATTA GTATTAGTTA TCACAGTTAC TGACAATAAA GGAGCATTCA 3900 

AATATATCAA GCCACAGAGT CAATTTATAG TAGATCTTGG GGCCTACCTA GAAAAAGAGA 3960 
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GCATATATTA TGTGACTACA AATTOGAAGC ATACAGCTAC ACGTTTTTCA ATCAAACCAC 4020 

TAGAGGATTA AACTTAATTA TCAACACTAA ATGACAGGTC CACATATATC TTCAAACTAT 4080 

ACATTATATC CAAACATCAT GAGCATTTAC ACTACACACT TTTACCATAT AAATCAATCT 4140 

CATTTAAAAT CCAAAATTAC TTCCAGCTAT CATCTGTTAG ACCTAGAGTG CGAATAGGTA 4200 

AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTCAACAACC 4260 

ATTTATACCG CCAATTCAGT ACATATACTA TAAATCTCAA AATGGGAAAT ACATCCATCA 4320 

CAATAGAATT CACAAGCAAA TTTTGGCCTT ATTTTACACT AATACATATG ATCTTAACTC 4380 

TAATCTCTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 4440 

ATAAAACATT CTGCAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAOT 4500 

GTTCTACCAT TATGCTGTGT CAAATTATAA TCTTGTATAT ATAAACAAAC AAATCCAATC 4560 

TTCTCACAGA GTCATGGTGG CGCAAAACCA COCCAACCAT CATGATAGCA TAGAGTAOTT 4620 

ATTTAAAAAT TAACATAATG ATGAATTATT GGTATGAGAT CAGGAACAAC ATTGGGGCAA 4680 

ATGCAGCCAT GTCCAAGCAC AAOAATCGGC GCACTGCCGG GACTCTAGAA AGGACCTGGG 4740 

ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 4800 

TAGCACAAAT AGCACTGTCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 4860 

CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTT ACAGTTCAAA 4920 

CAATAAAAAA CCACACTGAA AAAAACATCT CCACCTACCT TACTCAAGTC CCACCAGAAA 4980 

GGGTCAACTC ATCCAAACAA CCCACAACCA CATCACCAAT CCACACAAAT TCAGCCACAA 5040 

TATCACCAAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAATCA 5100 

CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAATCACG TTCAAAAAAT CCACCAAAAA 5160 

AACCAAAAGA TGATTACCAT TTTGAAGTGT TCAATTTTGT TCCCTGTAGT ATATGTGGTA 5220 

ATAATCAACT CTGCAAATCC ATCTGCAAAA CAATACCAAG CAACAAACCA AAGAAAAAAC 5280 

CAACCATCAA ACCCACAAAC AAACCAACCA CCAAAACCAC AAACAAAAGA GACCCCAAAA 5340 

CACCAGCCAA AATGCCAAAA AAAGAAATCA TCACCAACCC AGCAAAAAAA CCAACCCTCA 5400 

AGACCACAGA AAGAGACACC AGCATTTCAC AATCCACCGT GCTCGACACA ATCACTCCAA 5460 

AATACACAAT CCAACAGCAA TCCCTCCACT CAACCACCTC CGAAAACACA CCCAGCTCCA 5520 
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CACAAAT AC C CACAGCATCC GXGCCCTCCA CATTAAATCC TAATTAAAAA ACCTAGTCAC 
ATGCTTAGTT ATTCAAAAAC TACATCTTAG CAGAGAACCG TGATCTATCA AG CAAGAACA 
AAATTAAACC TGGGGCAAAT AACCATGGAG TTGCTGATCC ACAGGTCAAG TGCAATCTTC 
CTAACTCTTG CTGTTAATGC ATTGTACCTC ACCTCAAGTC AGAACATAAC TGAGGAGTTT 
TACCAATCGA CATGTAGTGC AGTTAGCAGA GOTTATTTTA GTGCTTTAAG AACAGGTTGG 
TATACCAGTG TCATAACAAT AGAATTAAGT AATATAAAAG AAACCAAATG CAATGGAACT 
GACACTAAAG TAAAACTTAT AAAACAAGAA TTAGATAAGT ATAAGAATGC AGTAACAGAA 
TTACAGCTAC TTATGCAAAA CACGCCAGCT GCCAACAACC GGGCCAGAAG AGAAGCACCA 
CAGTACATGA ACTACACAAT CAATACCACA AAAAACCTAA ATGTATCAAT AAGCAAGAAA 
AGGAAACGAA GATTTCTGGG CTTCTTGTTA GGTGTAGGAT CTGCAATAGC AAGTGGTATA 
GCTGTATCCA AAGTTTTACA CCTTGAAGGA GAAGTGAACA AAATCAAAAA TGCTTTGTTG 
TCTACAAACA AAGCTGTAGT CAGTCTATCA AAT GGGGTCA GTGTTTTAAC CAGCAAAGTG 
TTAGATCTCA AOAATTACAT AAATAACCGA ATATTACCCA TAGTAAATCA ACAGAGCTGT 
CGCATCTCCA ACATTGAAAC AGTTATAGAA TTCCAGCAGA AGAATAGCAG ATTGTTGGAA 
ATCACCAGAG AATTTAGTGT TAATGCAGGT GTAACAACAC CTTTAAGCAC TTACATGTTA 
ACAAACAGTG AGTTACTATC ATTGATCAAT GATATGCCTA TAACAAATGA CCAGAAAAAA 
TTAATGTCAA GCAATGTTCA GATAGTAAGG CAACAAAGTT ATTCTATCAT GTCTATAATA 
AAGGAAGAAG TCCTTGCATA TGTTGTACAG CTACCTATCT ATGGTGTAAT AGATACACCT 
TGCTGGAAAT TACACACATC ACCTCTATGC ACCACCAACA TCAAAGAAGG ATCAAATATT 
TGTTTAACAA GGACTGATAG AGGATGGTAT TGTGATAATG CAGGATCAGT ATCCTTCTTC 
CCACAGGCTG ATACTTGCAA AGTACAGTCC AATCGAGTAT TTTGTGACAC TATGAACAGT 
TTAACATTAC CAAGTGAAGT CAGCCTTTGT AACACTGACA TATTCAATTC CAAGTATGAC 
TGCAAAATTA TGACATCAAA AACAGACATA AGCAGCTCAG TAATTACTTC TCTTGGAGCT 
ATAGTGTCAT GCTATGGAAA AACTAAATGC ACTGCATCCA ATAAAAATCG TGGGATTATA 
AAGACATTTT CTAATGGTTG TGACTATGTG TCAAACAAAG GAGTAGATAC TGTGTCAGTG 
GGCAACACTT TATACTATGT AAACAAGCTG GAAGGCAAAA ACCTTTATGT AAAAGGGGAA 
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CCTATAATAA ATTACTATGA TCCTCTAOTG TTTCCTTCTG AT6AOTTTGA TGCATCAATA 7140 

TCTCAAGTCA ATGAAAAAAT CAATCAAAGT TTAGCTTTTA TTCGTAGATC TGATGAATTA 7200 

CTACATAATG TAAATACTGG CAAATCTACT ACAAATATTA TGATAACTAC AATTATTATA 7260 

OTAATCATTG TAGTATTGTT ATCATTAATA GCTATTGGTT TACTGTTGTA TTGCAAAGCC 7320 

AAAAACACAC CAGTTACACT AAGCAAAGAC CAACTAAGTG GAATCAATAA TATTGCATTC 7380 

AGCAAATAGA CAAAAAACTA CTTAATCATG TTTCAACAAC AATCTGCTGA CCACCAATCC 7440 

CAAATCAACT TAACAACAAA TATTTCAACA TCATAGCACA GGCTGAATCA TTTCCTCATA 7500 

TCATGCTACC TACACAACTA AGCTAGATCT TCAACTCATA GTTACATAAA AACCCCAAGT 7560 

ATCACAATCA AACACTAAAT CGACACATCA TTCACAAAAT TAACAACTGG GGCAAATATG 7620 

TCGCGAAGAA ATCCTTGTAA ATTTGAGATT AGAGGTCATT GCTTGAATGG TAGAAGATGT 7680 

CACTACAGTC ATAATTATTT TGAATGGCCT CCTCATGCAT TACTAGTGAG GCAAAACTTC 7740 

ATGTTAAACA AGATACTTAA GTCAATGGAC AAAAGCATAG ACACTTTGTC GGAAATAAGT 7 800 

GGAGCTGCTG AACTGGATAG AACAGAAGAA TATGCTCTTG GTATAGTTGG AGTGCTAGAG 7860 

AGTTACATAG GATCAATAAA CAACATAACA AAACAATCAG CATGTGTTGC TATGAGTAAA 7920 

CTTCTTATTG AGATCAACAG TGATGACATT AAAAAACTGA GAGATAACGA AGAACCCAAT 7980 

TCGCCTAAGA TAAGAGTGTA CAATACTGTT ATATCATACA TTGAGAGCAA TAGAAAAAAC 8040 

AACAAGCAAA CCATCCATCT GCTCAAAAGA CTACCAGCAG ACGTGCTGAA GAAGACAATA 8100 

AAGAACACAT TAGATATCCA CAAAAGCATA ACCATAAGCA ACTCAAAAGA GTCAACCGTG 8160 

AATGATCAAA ATGACCAAAC CAAAAATAAT GATATTACCG GATAAATATC CTTGTAGTAT 8220 

ATCATCCATA TTGATTTCAA GTGAAAGCAT GATTGCTACA TTCAATCATA AAAACATATT 8280 

ACAATTTAAC CATAACCATT TGGATAACCA CCAGTGTTTA TTAAATCATA TATTTGATGA 8340 

AATTCATTGG ACACCTAAAA ACTTATTAGA TGCCACTCAA CAATTTCTCC AACATCTTAA 8400 

CATCCCTGAA GATATATATA CAGTATATAT ATTAGTGTCA TAATGCTTGA CCATAACAAT 8460 

TTTATATCAT TCAACCATAA AACAACCTTA ATAAGGTTAT GGGACAAAAT GGATCCCATT 8520 

ATTAATGGAA ACTCTGCCAA TGTGTATCTA ACTGATAGTT ATCTAAAAGG TGTTATCTCT 8580 

TTTTCAGAAT GTAATGCTTT AGGGAGTTAC CTTTTTAACG GCCCCTATCT TAAAAATGAT 8640 
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TACACCAACT 


TAATTAGTAG ACAAAGCCCA 


CTACTAGAGC ATATGAATCT 


AAAAAAACTA 


8700 


ACTATAACAC AGTCATTAAT ATCTAGATAT CATAAAGOTG AACTGAAGTT AGAAGAACCA 


8760 


ACTTATTTCC 


AGTCATTACT TATGACATAT 


AAAAGTATGT CCTCGTCTGA 


ACAAATTGCT 


8820 


ACAACTAATT 


TACTTAAAAA AATAATACGA 


AGAGCTATAG AAATAAGTGA 


TGTAAAGGTG 


8880 


TACGCCATCT 


TGAATAAACT GGGACTAAAG 


GAAAAGGACA GAGTTAAGCC 


CAACAATAAT 


8940 


TCAGGTGATG 


AAAACTCAGT TCTTACAACC 


ATAATCAAAG ATGATATACT 


TTCAGCTGTG 


9000 


GAAAACAATC 


AATCATATAC AAATTCAGAC 


AAAAATCATT CAGTAAATCA AAATATCACT 


9060 


ATCAAAACAA 


CACTCTTGAA AAAATTGATG 


TGTTCAATGC AACATCCTCC 


ATCATGGTTA 


9120 


ATACACTGGT 


TCAATTTATA TACAAAATTA 


AATAACATAT TAACACAATA 


TCGATCAAAT 


9180 


GAGGTAAAAA 


GTCATGGGTT TATATTAATA 


GATAATCAAA CTTTAAGTGA 


TTTTCAGTTT 


9240 


ATTTTAAATC 


AATATGGTTG TATCGTTTAT 


CATAAAGGAC TCAAAAAAAT 


CACAACTACT 


9300 


ACTTACAATC 


AATTTTTGAC ATGGAAAGAC 


ATCAGCCTTA GCAGATTAAA 


TGTTTGCTTA 


9360 


ATTACTTGGA 


TAAGTAATTG TTTAAATACA 


TTAAATAAAA GCTTAGGGCT 


GAGATGTGGA 


9420 


TTCAATAATG 


TTGTGTTATC ACAACTATTT 


CTTTATGGAG ATTGTATACT 


GAAATTATTC 


9480 


CATAATGAAG 


GCTTCTACAT AATAAAAGAA 


GTAGAGGGAT TTATTATGTC 


TTTAATTCTA 


9540 


AACATAACAG AAGAAGATCA ATTTAGGAAA CGATTTTATA ATAGCATGCT AAATAACATC 


9600 


ACAGATGCAG 


CTATTAAGGC TCAAAAAAAC 


CTACTATCAA GAGTATGTCA 


CACTTTATTA 


9660 


GACAAGACAG 


TGTCTGATAA TATCATAAAT 


GGTAAATGGA TAATCCTATT 


AAGTAAATTT 


9720 


CTTAAATTGA TTAAGCTTGC AGGTGATAAT 


AATCTCAATA ACTTGAGTGA 


GCTTTATTTT 


9780 


CTCTTCAGAA 


TCTTTGGACA TCCAATGGTC 


GATGAAAGAC AAGCAATGGA 


TGCTGTAAGA 


9840 


ATTAACTGTA 


ATGAAACCAA GT TCT ACT T A 


TTAAGTAATC T AAGT AC GTT 


AAGAGGTGCT 


9900 


TTCATTTATA 


GAATCATAAA GGGGTTTGTA 


AATACCTACA ACAGATGGCC 


CACTTTAAGG 


9960 


AATGCTATTG 


TTCTACCTCT AAGATGGTTG 


AACTATTATA AACTTAATAC 


TTATCCATCT 


10020 



CTACTTGAAA TCACAGAGAA AGATTTGATT ATTTTATCAG GATTGCGGTT CTATCGTGAG 10080 
TTTCATCTGC CTAAAAAAGT GGATCTTGAA ATGATAATAA ATGACAAAGC CATTTCACCT 10140 
CCAAAAGATT TAATATGGAC TAGTTTTCCT AGAAATTACA TGCCATCACA TATACAAAAT 10200 
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TATATAGAAC ATGAAAAGTT GAAGTTCTCT GAAAGTGACA GATCAAGAAG AGTACTAGAG 
TATTACTTGA GAGATAATAA ATTCAATGAA TGCGATCTAT ACAATTGTGT GGTCAATCAA 
AGCTATCTCA ACAACTCTAA CCATGTGGTA TCACTAACTG GTAAAGAAAG AGAGCTCAGT 
GTAGGTAGAA TGTTTGCTAT GCAACCAGGT ATGTTTAGGC AAATTCAAAT CTTAGCAGAG 
AAAATGATAG CCGAAAATAT TTTACAATTC TTCCCTGAGA GTTTGACAAG ATATGGTGAT 
CTAGAGCTTC AAAAGATATT AGAATTAAAA GCAGOAATAA GCAACAAGTC AAATCGTTAT 
AATGATAACT ACAACAATTA TATCAGTAAA TGTTCTATCA TTACAGACCT TAGCAAATTC 
AATCAAGCAT TTAGATATGA AACATCATGT ATCTGCAGTG ATGTATTAGA TGAACTGCAT 
GGAGTACAAT CTCTGTTCTC TTGGTTGCAT TTAACAATAC CTCTTGTCAC AATAATATGT 
ACATATAGAC ATGCACCTCC TTTTATAAAG GATCATGTTG TTAATCTTAA TAAAGTTGAT 
GAACAAAGTG GATTATACAG ATATCATATG GGTGGTATTG AAGGCTGGTG TCAAAAACTG 
TGGACCATTG AAGCTATATC ATTATTAGAT CTAATATCTC TCAAAGGGAA ATTCTCTATC 
ACAGCTCTAA TAAATGGTGA TAATCAGTCA ATTGATATAA GTAAACCAGT TAGACTTATA 
GAGGGTCAGA CCCATGCTCA AGCAGATTAT TTGTTAGCAT TAAATAGCCT TAAATTGCTA 
TATAAAGAGT ATGCGGGCAT AGGCCACAAG CTCAAGGGAA CAGAGACCTA TATATCCCGA 
GATATGCAAT TCATGAGCAA AACAATCCAG CACAATGGAG TGTACTATCC AGCCAGTATC 
AAAAAAGTCC TGAGAGTAGG TCCATGGATA AATACAATAC TTGATGATTT TAAAGTTAGT 
TTAGAATCTA TAGGTAGCTT AACACAGGAG TTAGAATATA GAGGAGAGAG CTTATTATGC 
AGTTTAATAT TTAGGAACAT TTGGTTATAC AATCAAATTG CTTTGCAACT CCGAAATCAT 
GCATTATGTC ACAATAAGCT ATATTTAGAT ATATTGAAAG TATTAAAACA CTTAAAAACT 
TTTTTTAATC TTGATAGTAT TGATATGGCT TTAACATTGT ATATGAATTT GCCTATGCTG 
TTTGGTGGTG GTGATCCTAA TTTGTTATAT CGAAGCTTTT ATAGGAGAAC TCCAGACTTC 
CTTACAGAAG CTATAGTACA TTCAGTGTTT GTGTTGAGCT ATTATACTGG TCACGATTTA 
CAAGATAAGC TCCAGGATCT TCCAGATGAT AGACTGAACA AATTCTTGAC ATGTAtCATC 
ACGTTTGATA AAAATCCCAA TGCCGAGTTT GTAACATTGA TGAGAGATCC ACAGGCTTTA 
GGGTCTGAAA GGCAAGCAAA AATTACTAGT GAGATTAATA GATTAGCAGT GACAGAAGTC 
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TTAAGTATAG CTCCAAACAA AATATTTTCT AAAAGTGCAC AACATTATAC TACCACTGAG 11820 

ATTGATCTAA ATGATATTAT GCAAAATATA GAACCAACTT ACCCTCATGG ATTAAGAGTT 11880 

GTTTATGAAA GTTTACCTTT TTATAAAGCA GAAAAAATAG TTAATCTTAT ATCAGGAACA 11940 

AAATC CATAA CTAATATACT TGAAAAAACA TCAGCAATAG ATTCAACTGA TATTAATAGG 12000 

GCTACTGATA TGATGAGGAA AAATATAACT TTACTTATAA GGATACTTCC ACTAGATTGT 12060 

AACAAAGACA AAAGAGAGTT ATTAAGTTTA GAAAATCTTA GTATAACTGA ATTAAGCAAG 12120 

TATGTAAGAG AAAGATCTTG GTCGTTATCC AATATAGTAG GAGTAACATC GCCAAGTATT 12180 

ATGTTCACAA TGGACATTAA ATATACAACT AGCACTATAG CCAGTGGTAT AATTATAGAA 12240 

AAATATAATG TTAATAGTTT AACTCGTGGT GAAAGAGGAC CTACTAAGCC ATGGGTAGGT 12300 

TCATCTACGC AGGAGAAAAA AACAATGCCA GTGTACAATA GACAAGTTTT AACCAAAAAG 12360 

CAAAGAGACC AAATAGATTT ATTAGCAAAA TTAGACTGGG TATATGCATC CATAGACAAC 12420 

AAAGATGAAT TCATGGAAGA ACTGAGTACT GGAACACTTG GACTGTCATA TGAGAAAGCC 12480 

AAAAAATTGT TTCCACAATA TCTAAGTGTC AATTATTTAC ACCGCTTAAC AGTCAGTAGT 12540 

AGACCATGTG AATTCCCTGC ATCAATACCA GCTTATAGAA CAACAAATTA TCATTTCGAT 12600 

ACTAGTCCTA TCAACCATGT ATTAACAGAA AAGTATGGAG ATGAAGATAT CGACATTGTG 12660 

TTTCAAAATT GCATAAGTTT TGGTCTTAGC TTAATGTCGG TTGTGGAACA ATTCACAAAC 12720 

ATATGTCCTA ATAGAATTAT TCTCATACCG AAGCTGAATG AGATACATTT GATGAAACCT 12780 

CCTATATTTA CAGGAGATGT TGATATCATC AAGTTGAAGC AAGTGATACA AAAACAGCAC 12840 

ATGTTCCTAC CAGATAAAAT AAGTTTAACC CAATATGTAG AATTATTCCT AAGTAACAAA 12900 

GCACTTAAAT CTGGATCTCA CATCAACTCT AATTTAATAT TAGTACATAA AATGTCTGAT 12960 

TATTTTCATA ATGCTTATAT TTTAAGTACT AATTTAGCTG GACATTGGAT TCTGATTATT 13020 

CAACTTATGA AGGATTCAAA AGGTATTTTT GAAAAAGATT GGGGAGAGGG GTATATAACT 13080 

GATCATATGT TCATTAATTT GAATGTTTTC TTTAATGCTT ATAAGACTTA TTTGCTATGT 13140 

TTTCATAAAG GTTATGGTAA AGCAAAATTA GAATGTGATA TGAACACTTC AGATCTTCTT 13200 

TGTGTTTTGG AGCTAATAGA CAGTAGCTAC TGGAAATCTA TGTCTAAAGT TTTCCTAGAA 13260 

CAAAAAGTCA TAAAATACAT AATCAATCAA GACACAAGTT TGCATAGAAT AAAAGGTTGT 13320 
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CATAOTTTTA AGTTATGGTT TTTAAAACGC CTTAATAATG CTAAATTTAC CGTATOCCCT 13380 

TGGGTTGTTA ACATAGATTA TCACCCAACA CACATGAAAG CTATATTATC TTACATAGAT 13440 

TTAGTTAGAA TGGGGTTAAT AAATGTAGAT AAATTAACCA TTAAAAATAA AAATAAATTC 13500 

AATGATGAAT TTTACACATC AAATCTCTTT TACATTAGTT ATAACTTTTC AGATAACACT 13560 

CATTTGCTAA CAAAACAAAT AAGAATTGCT AATTCAGAAT TAGAAAATAA TTATAACAAA 13620 

CTATATCACC CAACCCCAGA AACTTTAGAA AATATGTCAT TAATTCCTGT CAAAAGTAAT 13680 

AATAGTAATA AACCTAAATT TGGTATAAGT GGAAATACCG AATCTATGAT GACGTCAACA 13740 

TTCTCCAATA AAACGCATAT TAAATCTTCC GCTGTTATTA CAAGATTCAA TTATAGTAAA 13800 

CAAGACTTGT ACAATTTATT TCCAATTGTC GTGATAGACA GGATTATAGA TCATTCAGGT 13860 

AATACAGCAA AATCTAACCA ACTCTACACT ACCACTTCAC ATCAGACATC TTTAGTAAGG 13920 

AATAGTGCAT CACTTTATTG CATGCTTCCT TGGCATCATG TCAATAGATT TAACTTTGTA 13980 

TTTAGTTCCA CAGGATGCAA GATCAGTATA GAGTATATTT TAAAAGATCT TAAGATTAAA 14040 

GACCCCAGTT GTATAGCATT CATAGGTGAA GGAGCTGGTA ACTTATTATT ACGTACAGTA 14100 

GTAGAACTTC ATCCAGACAT AAGATACATT TACAGAAGTT TAAAAGATTG CAATGATCAT 14160 

AGTTTACCTA TTGAATTTCT AAGGTTATAC AACGGGCATA TAAACATAGA TTATGGTGAG 14220 

AATTTAACCA TTCCTGCTAC AGATGCAACT AATAACATTC ATTGGTCTTA TTTACATATA 14280 

AAATTTGCAG AACCTATTAG CATTTTTGTC TGCGATGCTG AATTACCTGT TACAGCCAAT 14340 

TGGAGTAAAA TTATAATTGA ATGGAGTAAG CATGTAAGAA AGTGCAAGTA CTGTTCCTCT 14400 

GTAAATAGAT GCATTTTAAT TGCAAAATAT CATGCCCAAG ATGATATTGA TTTCAAATTA 14460 

GATAACATTA CTATATTAAA AACTTACGTG TGCCTAGGTA. GCAAGTTAAA AGGATCTGAA 14520 

GTTTACTTAG TCCTTACAAT AGGCCCTGCA AATATACTTC CTGTTTTTAA TGTTGTGCAA 14580 

AATGCTAAAT TGATTCTTTC AAGGACTAAA AATTTCATTA TGCCTAAAAA AACTGACAAA 14640 

GAATCTATCG ATGCAAATAT TAAAAGCTTA ATACCTTTCC TTTGTTACCC TATAACAAAA 14700 

AAAGGAATTA AGACTTCATT GTCAAAATTG AAGAGTGTAG TTAGTGGAGA TATATTATCA 14760 

TATTCTATAG CTGGACGTAA TGAAGTATTC AGCAACAAGC TTATAAACCA CAAGCATATG 14820 

AATATCCTAA AATGGCTAGA TCATGTTTTA AACTTTAGAT CAGCTGAACT TAATTACAAT 14880 
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CATTTATATA TGATAGAGTC CACATATCCT TACTTAA6TO AATTGTTAAA CAGTTTAACA 14940 

ACCAATGAGC TCAAGAAGCT GATTAAAATA ACAGGTAGTG TACTATACAA CCTTCCCAAC 15000 

GAACAGTAAC TTAAAACATC ATTAACAAGT TTGATCAAAT TTAGATGCTA ACACATCATA 15060 

ATATTATAGT TATTAAAAAA TATATATGCA AACTTTTCAA TAATTTAGCA TATTGATTCC 15120 

AAAGTTATCA TTTTGGTCTT AAGGGGTTGA ATAAAAATCT AAAACTAACA ATTATACATG 15180 

TGCATTTACA ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15229 
(2) INFORMATION FOR SEQ ID NO i 26s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lye Val Tyr Ala He Leu 
115 120 125 
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Asn Lys Leu Gly Leu Lys Olu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
" 5 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Asn 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Qln His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Asp Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lye Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asn Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
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405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg lie Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ala Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Asn Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Lys Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Sar Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 
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Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Olu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Lys Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cya Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys His Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
345 950 955 960 
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His Leu Lys Thr Phe Phe Asn Leu Asp Ser lie Asp Met Ala Leu Thr 
965 970 975 

Leu Tyx Met Asn Leu Pro Met Leu Phe Gly Qly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys He He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser lie Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Ser Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He Ho Glu Lys Tyr Asn Val 
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1235 1240 



1245 



Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
125 ° 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 "70 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Lou Thr Val Ser Ser 
"30 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 "SO 1355 * 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1335 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1 4 1° 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lye Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
i44 5 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His lie 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
I 4 * 0 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 " 1520 
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Gly Tyr lie Thr Asp His Met Phe lie Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Oly Lys Ala 
1540 1545 1550 

Lys Leu Olu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu lie Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lye Val He Lys Tyr He He Asn Gin Asp Thr Ser Leu His Arg 
1585 1590 1595 1600 

He Lys Gly Cya His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 168C 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asn Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Gly He Ser Gly Asn Thr Glu Ser Met Met Thr Ser Thr 
1730 1735 1740 

Phe Ser Asn Lys Thr His He Lys Ser Ser Ala Val He Thr Arg Phe 
1745 1750 1755 176( 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Arg He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 



1780 



1785 



1790 
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Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cye Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1841 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cya Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

lie Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu Val Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asn Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Ser Gly 
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2065 2070 2075 208< 

Asp He Leu Ser Tyr Ser He Ala Oly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 216< 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SBQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 

ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 
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TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 

AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 720 

AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 780 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAOAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 1020 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 1140 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 1200 

GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 1260 

AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 1320 

TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 1380 

AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 1440 

GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 1500 

CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 1560 

AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 1620 

TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 1680 

CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 1740 

TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 1800 

ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 1860 

GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 1920 

GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 1980 

TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 2040 

ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 2100 
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CAAGTGTOGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAG CG AAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTAT CATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
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CATGAAGACA TTCAACCCCA CTCATQAOAT CATTGCTCTA T6T6AATTTQ AAAATATTAT 
GACATCAAAA AQAGTAATAA TACCAACCTA TCTAAGATCA ATTA6TOTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 
CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 
AGTTCCACCA TTATGCTGTG TCAAACCATA ATCCTGTATA TACAAACAAA CAAATCCAAT 
CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATG CAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TT CAG CTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
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CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTOATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTT CCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
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ATTACCAAGT GAAGTCA6CC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATC CTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
A CAT T A GAT A TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
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TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACCATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC T AC TACIT AC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AG ACAAG CAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAGGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
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TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAOAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGAT AT ATT G AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
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GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTG CAT AA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTT AC AG GAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTT C 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATT CAACTT 
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ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 13080 

ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 13140 

AAAGGTTATG QTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 13200 

TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 13260 

GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 133 20 

TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 13380 

GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 13440 

AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 13500 

GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 13560 

CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 13620 

CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 13*80 

AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 13740 

AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAQAC 13800 

TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 13860 

GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 13920 

GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 13980 

TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 14040 

AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 14100 

CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 14160 

CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 14220 

ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 14280 

GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 14340 

AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 14400 

AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 14460 

ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 14520 

TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 14580 
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AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14640 

ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 14700 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 14820 

CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 14880 

TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 15060 

TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 15120 

TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15219 
(2) INFORMATION FOR SBQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDBDNBSS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Aep 
15 10 15 

Ser Tyr Leu Lya Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lya 
65 70 75 80 
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Lou Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys lie 
100 105 no 

lie Arg Arg Ala lie Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Aon Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
22 5 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cye Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 
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Lys Lys Arg Phe Tyr Asn Ser Net Leu Asn Asn lie Thr Aop Ala Ala 
355 360 365 

lie Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Arg Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu ?ro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys :?he 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
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625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 
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Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu lie Phe 
915 920 925 

Arg Asn lie Trp Leu Tyr Asn Gin lie Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp lie Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser lie Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
9B0 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

lie Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val lie Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 
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Asn Lys Asp Lye Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser lie Thr 
1185 1190 1195 I20i 

Glu Leu Ser Lye Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 128C 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 



Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 136< 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 



1300 



1305 
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1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Qly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr lie Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 345 - 



Phe Ser Ser Lys Met His lie Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 176i 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1B4C 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lye Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1585 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu lie Leu Thr lie Gly Pro Ala Asn He 



2005 
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SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCTAJS97/16718 



- 346 - 



Leu Pro Val Phe Asp Val Val Gin Aan Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2060 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SBQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: RNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 
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ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAQTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 

TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 600 

AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 660 

GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 720 

AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 780 

TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 840 

AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 900 

CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 960 

AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 1020 

CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 1080 

CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAOA 1140 

TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 1200 

GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 1260 

AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 1320 

TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 1380 

AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 1440 

GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 1500 

CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 1560 

AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 1620 

TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 1680 

CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 1740 

TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 1800 
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ATGTTTTCGT OCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGGAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTT CACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
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AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAAGTGAA 
CATAAAGCAT TCTGTAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAATCCAAT 
CTTCTCACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
OATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
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ACAATAAAAA ACCACACTOA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 4980 

AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 5040 

ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 5100 

ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 5160 

AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGG CAACAAT 5220 

CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 5280 

ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 5340 

GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 5400 

ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 5460 

ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 5520 

ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 5580 

TAGTTATTCA AAAACTACAT CTT AG CAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 5640 

AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 5700 

TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 5760 

ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 5820 

TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 5880 

TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGGAGTAA CAGAATTACA 5940 

GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 6000 

TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 6060 

ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 6120 

ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 6180 

AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 6240 

TCTCAAGAAT TACATAAATA AC CAATTAT T AC C CAT AGT A AATCAACAGA GCTGTCGCAX 6300 

CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 6360 

CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACAC CTT T A AGCACTTACA TGTTGACAAA 6420 

CAGTGAGTTA C TAT C ATT AA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAA1 6480 
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GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 6540 

AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTOCTG 6600 

GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 6660 

AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 6720 

GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 6780 

ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 6840 

AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 6900 

GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 6960 

ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 7020 

CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 7080 

AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 7140 

AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 7200 

TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 7260 

CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTOTA AAGCCAAAAA 7320 

CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 7380 

ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 7440 

CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 7S00 

TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 7560 

ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGOGCAAA TATGTCGCGA 7620 

AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 7680 

AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 7740 

AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 7800 

GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 7860 

ATAGGATCTA TAAACAACAI AACAAAACAA TCAGCATGTG TTGCTATGAG TAAA CTTC T T 7920 

ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 7980 

AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 8040 
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CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG AT AT CAT AAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAAT CAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA AT GCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAG GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
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GCAGCTATTA AGGCTCAAAA OOACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AQACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGOTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATT AT AT CAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
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GTCCTGAGAO TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GOAGTTAGAA TACAGAGOAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTG6TT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT G CTGTTT GGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGGACA TTAAATATAC AACTAGCACT AT AGCCAGT G GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGGAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACT GAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AAT TGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
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CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTACTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
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GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAG ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT AT CAT ATT CT 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 
(2) INFORMATION FOR SSQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu 
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20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
*5 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCI7US97/16718 



358 



Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys lie 
305 310 315 320 

Leu Lye Leu Phe His Asn Glu Gly Phe Tyr lie lie Lya Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lya Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He Xle Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His Xle Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 
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Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lya Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn lie Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 



Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
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850 855 860 

Asp Met Gin Phe Met Ser Lys Thr lie Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser lie Lys Lys Val Leu Arg Val Gly Pro Trp lie Asn Thr 
885 890 895 

lie Leu Asp Asp Phe Lys Val Ser Leu Glu Ser lie Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 
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lie Val Asn Leu lie Ser Gly Thr Lys Ser lie Thr Asn lie Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 H80 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro lie Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 
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Arg lie lie Leu He Pro Lys Leu Asn Glu He Hia Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cye Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cye Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
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Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu lie Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys lie Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His lie Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 176< 

Asn Tyr Ser Lye Gin Asp Leu Tyr Asn Leu Phe Pro lie Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
I860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lye Hie Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg CyB 
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1960 
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lie Leu lie Ala Lye Tyr His Ala Gin Asp Asp lie Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn lie Thr lie Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1965 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr lie Gly Pro Ala Asn lie 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu lie Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asp He Lys Ser Leu lie Pro Phe Leu Cys Tyr Pro lie Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SRQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 31: 
ACGGGAAAAA AATGCGTACT ACAAACTT6C ACATTCQAAA AAAAT6GGGC AAATAAGAAC 
TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 
ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 
ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 
ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAAXGC CAATACTACA AAATGGAGGA 
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 
AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 
CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTAT G TTT A G TCTAATTCAA 
TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAAT AT G AC C TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGGCTCTTAG CAAAGTCAAG TTGAAT GAT A CATTAAATAA GGATCAGCTG CTGTCATCCA 
GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 366 - 



CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CAT CCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTW 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACAC CTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACr 
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TGATGATTTT TGATCA6CGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCT CATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CAT GAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACG TTTT T CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AAT CACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 
CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 
AGTTCCACCA TTATGCTGTG TCAAAC CAT A ATCCTGTATA TACAAACAAA CAAATCCAAT 
CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 
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TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GC CATAAT AT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAOCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTOA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC! 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC! 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACOAAGATTT CTAGGCTTCT TGTTAGGTGT G GGATCT GCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 



4680 
4740 
4800 
4860 
4920 
4980 
5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5880 
5940 
6000 
6060 
6120 
6180 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 



PCTYUS97/16718 



- 369 - 



AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 6240 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 6300 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 6360 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 6420 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 6480 

GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 6540 

AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 6600 

GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 6660 

AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 6720 

GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 6780 

ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 6840 

AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 6900 

GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 6960 

ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 7020 

CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 7080 

AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 7140 

AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 7200 

TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 7260 

CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 7320 

CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 7380 

ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 7440 

CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 7500 

TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 7560 

ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 7620 

AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 7680 

AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 7740 
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AACAAGATAC TCAAGTCAAT OQACAAAAGC ATAGACACTT TGTCTGAAAT AAGT GG AG CT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CAT AAC CAT A AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CT T TAG G GAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACCATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGT CATG GGTT T ATATT AATAGATAAT CAAA CTTTA A GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
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AATCAATTTT TGACATOOAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAG GAT CAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
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ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GG AAATTCTC TATCACAOCT 
CTGATAAATG OTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATT T GT T A GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTT CAT G A GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCT CCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTC1 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA C TGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
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GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAO TGTCAATTAT TTACACCCTT TAACAGTCAG TAQTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AOAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
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TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAOGTT ATACAACGGG CATATAAACA TAOATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 
(2) INFORMATION FOR SEQ ID NO: 32: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND ED NESS : 

(D) TOPOLOGY: linaar 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NOs32: 

Mat Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
1 S 10 15 

Ser Tyr Leu Lys Oly Val He Ser Phe Ser Glu Cya Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 8 J 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 so 95 

Met Ser Ser Ser Glu Gin He Ale Thr Thr Asn Leu Leu Lys Lys He 
10° 105 no 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 iso 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
1«5 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He Hie Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Oly Phe Gin Phe He Leu Asn Qln Tyr Gly Cya He Val Tyr His Lys 
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245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp lie Ser Leu Ser Arg Leu Asn Val Cys Leu lie Thr Trp lie 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys Zle 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr lie lie Lys Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu lie Leu Asn lie Thr Glu Glu Asp Gin Phe 
340 345 350 

Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg lie Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 
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Lys Lys Val Asp Leu Glu Mot lie lie Asn Asp Lys Ala lie Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Qln Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
630 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 
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Lye Phe Ser lie Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
80S 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu LyB 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
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1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Olu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lye Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1265 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 
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Tyr His Phe Asp Thr Ser Pro lie Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Oly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
"25 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 14B5 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lye Val Phe Leu Glu 
1570 1575 1580 

Gin Lye Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
"85 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 
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Pro Thr His Met Lys Ala lie Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
!665 1670 1675 1681 

Ser Asp Asn Thr His Leu Leu Thr Lys Oln He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
17 « 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
I860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
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1905 1910 1915 1920 

Tyr Leu His lie Lye Phe Ala Glu Pro lie Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Qlu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lye Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Sly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 33: 
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(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 152X9 base pairs 

(B) TYPE : nucleic acid 

(C) STRAND ZD NESS i single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 
ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 
TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 
ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 
ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 
ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 
AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 
CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 
TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 
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GCAAATACAC TATTCAACGT AGTACA66AQ ATAATATTGA CACTCCCAAT TATGATGTCC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAO AAT CTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
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TGATGAAAAA TTAAGTGAAA TATTAGOAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATOAAAOAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
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ACAATAOAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT QATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAACTOAA 
CATAAAGCAT TCTGTAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAATCCAAT 
CTTCTCACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAOATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
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TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAA6TATAAO AATOCAGTAA CA6AATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAOOCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAOA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
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CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATOTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTOOAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAOAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATOATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
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AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAOTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTOTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAG GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TG AT CTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
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AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAQCAA ATTCAATCAG 10620 

GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 10680 

CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 10740 

AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 10800 

AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 10860 

ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 10920 

CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 10980 

CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 11040 

GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 11100 

CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 11160 

GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 11220 

TCTATAGGCA GCTTAACACA GGAGTTAOAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 11280 

ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 11340 

TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 11400 

AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 11460 

GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 11520 

GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTAT TATA CTGGTCACOA TTTACAAGAT 11580 

AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 11640 

GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 11700 

GAAAGGCAAG CTAAAATTAC TAGTGAOATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 11760 

ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 11820 

CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 11880 

GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 11940 

ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 12000 

GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 12060 

GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 12120 
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AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGGACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAA CAT ATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATOAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAAGAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTGAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
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AACAAACCTA AATTTTGTAT AAQTGGAAAT ACCQAATCTA TOATGATOTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTT CAT CC AG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGX 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAG TTATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCr 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAAC GAG ACATTAGTTT TTGACACTTT TTTTCTCGT 
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(2) INFORMATION FOR SEQ ID NO :34 s 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH j 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNBSS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Asp Pro He He Asn Oly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Oly Val He Ser Phe Ser Glu Cys Asn Ala Leu Oly 
20 25 30 

Ser Tyr Leu Phe Asn Oly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 



Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 HO 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
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His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 
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Leu Met Cye Ser Met Gin His Pro Pro Ser Trp Leu lie Hie Trp Phe 
195 200 205 

Aan Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cye He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cy» Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
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465 470 475 480 

Pro Thr Lou Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Olu He Thr Glu Asn Asp 
500 505 510 

Leu lie He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lye Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lye Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 
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lie Lys Asp His Val Val Asn Leu Asn Olu Val Asp Olu Oln Ser Qly 
755 760 765 

Leu Tyr Arg Tyr His Met Oly Oly lie Olu Oly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr lie Olu Ala lie Ser Leu Leu Asp Leu lie Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ssr lie Thr Ala Leu He Asn Oly Asp Asn Gin Ser He Asp 
80S 810 815 

He Ser Lye Pro Val Arg Leu He Glu Oly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Oly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Oly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 $60 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 
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Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lya Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Aap He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg lie Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 

1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/IJS97/16718 



- 398 - 



1300 



1305 



1310 



Sor Thr Gly Thr Leu Oly Leu Ser Tyr Glu Lys Ala Lys Lye Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 136i 

Tyr His Phe Asp Thr Ser Pro He Asn. His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp lie Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Aen 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1441 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 152( 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 
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Gin Lye Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1601 

He Lys Gly Cya His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asp 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Net Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 ^16B( 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Aap Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 175S 1760 

Asn Tyr Ser Lye Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe lie Gly Glu Gly Ala 



1845 



1850 



1855 
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Gly Aan Lou Leu Leu Arg Thr Val Val Olu Leu His Pro Asp He Arg 
I860 1865 1870 

Tyr lie Tyr Arg Ser Leu Lya Asp Cys Asn Asp His Ser Leu Pro lie 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Oly Olu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1»10 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cye Asp 
1925 1930 !935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Ly B Tyr Cys Ser Ser Val Asn Arg Cys 
1955 i960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
"85 1990 1995 * 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Val He Lys Ser Leu He Pro Phe Leu Cya Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
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2130 



2135 



2140 



Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 



(2) INFORMATION FOR SEQ ID NO: 35 s 

(1) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: RNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CATATCACTC ACTCTGGGAT GGAG 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TCAGAACATC AAGCACCOCC 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 



ACAGTCAAGA CTGAGATGAG 



20 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY x linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3B: 
AAGAGTCAGA TACATGTGGA 20 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ACATGAATCA GCCTAAAGTC 20 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 
CCOAAAGAGT TCCTGCGTTA CGACC 
(2) INFORMATION FOR SEQ ID NO:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND BDNESS : Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 41: 
CAGTCCACAC AAGTACCAGG 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
GTCAGAAGCT GTGGACCATC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 43: 



AATATTGCTA CAACAATOGC 



20 



(2) INFORMATION FOR SBQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNSSS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 44: 
ACTCTTCATT CCTAOACTGG 20 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GTCCAATTAT GACTATGAAC 20 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNSSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 



AGAACAGACA T6AAGCTTOC 



20 



(2) INFORMATION FOR SEQ ID NO: 47: 

(±) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 
CCAACAAGGA ATGCTTCTAG 20 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
ACAGCACTAT CTATGATTGA CCTGG 25 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 
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GCAACAT6GT TTACACATGC 

(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 20 base pair* 
<B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGATTGAGAG TTGATCCAGG 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
AGGAGATACT TAAACTAAGC 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
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TAAGCTTATG CCTTTCAGCG 



20 



(2) INFORMATION FOR SBQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 
TTAACGGACC TAAGCTGTGC 
(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GAAACAGATT ATTATGACGG 20 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
CGGGCTATCT AGGTGAACTT CAGG 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

ATTTGGATAT GGAATATGAG 20 

(2) INFORMATION FOR SBQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 
1 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 57: 
ACTCAACTGA ACTACCAGTG 20 
(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
AAGAACATCA TGTATTTCAG 20 
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(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTER IS TICS : 

(A) LENGTH ; 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
TTATCAACGC ACTGCTCATO 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
ATTTTCAGCA ATCACTTGGC ATGCC 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 61: 
GCCTCTGTGC AAACAAGCTG 
(2) INFORMATION FOR SEQ ID NO: 62: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
TCTCTAGTTA CTCTAGCAGC 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
AGGTCOTTGT TTGTGAGGAG 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TCGTCCTCTT CTTTACTGTC 
(2) INFORMATION FOR SEQ ID NO: 65: 



(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) 



MOLECULE TYPE: RNA (genomic) 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SBQ ID NO: 65: 
CCGTCCTCGA GCTAGCCTCG 20 
(2) INFORMATION FOR SEQ ID NO: 66: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 66: 
CTCCTCCAGG CTCACATTGG 20 
(2) INFORMATION FOR SBQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGGTTGGTAC ATAGCTCTGC 20 
(2) INFORMATION FOR SEQ ID NO: 68: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH : 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPEs RNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 68: 
CACCCATCTO ATATTTCCCT GATGG 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNKSS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TGGTTGACAG TACAAATCTG 20 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNKSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
CTGAAATGGG AAGATTGTGC 20 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDBDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 71: 
AGCAATCTAC ACTGCCTACC 20 
(2) INFORMATION FOR SBQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
TCACAGATGA TTCAATTATC 20 
(2) INFORMATION FOR SEQ ID NO:73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDBDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 
GATCCTAGAT ATAAGTTCTC 20 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

ACCAAACAAA GTTGGGTAAG 0 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 32 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS: single 
<D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GGGGGATCCA TCCCTAATCC TGCTCTTGTC CC 32 
(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
GATTCCTCTG ATGGCTCCAC 
(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
TAACAGTCAA GGAGACCAAA O 
(2) INFORMATION FOR SEQ ID NO: 78 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GGGAAGCTTA ACCCTAATCC TGCCCTAGGT GG 
(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) ST RAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
ACCAGACAAA GCTGGGAATA GA 
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What is claimed is: 

1. An isolated, recombinantly-generated, 
attenuated, nonsegmented, negative- sense, single 
stranded RNA virus of the Order Mononegavirales having 
at least one attenuating mutation in the 3 v genomic 
promoter region and having at least one attenuating 
mutation in the RNA polymerase gene* 

2. The virus of Claim 1 wherein the virus 
is from the Family Paramyxoviridae . 

3. The virus of Claim 2 wherein the virus 
is from the Subfamily Paramyxovirinae. 

4. The virus of Claim 3 wherein the virus 
is from the Genus Morbillivirua . 

5. The virus of Claim 4 wherein the virus 
is measles virus. 

6. The measles virus of Claim 5 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 26 (A — » T) , nucleotide 42 (A 



— > T or A -> C) and nucleotide 96 (G -> 
A) , where these nucleotides are 
presented in positive strand, 
antigenomic, message sense; and 
(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 331 (isoleucine 
-> threonine) , 1409 (alanine -» 
threonine) , 1624 (threonine alanine) , 
1649 (arginine —►methionine), 1717 
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(aspartic acid -> alanine) , 1936 
(histidine -* tyrosine), 2074 
(glutamine -> arginine) and 2114 
(arginine -> lysine) . 
7. The virus of Claim 3 wherein the virus 



8. The virus of Claim 7 wherein the virus 



is human parainf luenzae virus type 3 (PIV-3) . 



9. The PIV-3 of Claim 8 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 23 (T -> C) , nucleotide 24 (C 

T) , nucleotide 28 (G — > T) and 
nucleotide 45 (T — > A) , where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 942 (tyrosine -> 
histidine) , 992 (leucine -> 
phenylalanine) , 1292 (leucine -> 
phenylalanine), and 1558 (threonine 
isoleucine) . 

10. The virus of Claim 3 wherein the virus 



11, The virus of Claim 2 wherein the virus 



12 . The virus of Claim 11 wherein the virus 



is from the 



Genus Paramyxovirus* 



is from the Genus Rubulavirua . 



is from the Subfamily Pneumovirinae. 



is from the Genus Pneumovxrua . 
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13. The virus of Claim 12 wherein the virus 
is human respiratory syncytial virus (RSV) subgroup B. 

14. The virus of Claim 13 wherein: 

(a) the at least one attenuating mutation in 
the 3' genomic promoter region is 
selected from the group consisting of 
nucleotide 4 (C G) and the insertion 
of an additional A in the stretch of A»s 
at nucleotides 6-11, where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 353 (arginine -> 
lysine), 451 (lysine -> arginine), 1229 
(aspartic acid -> asparagine) , 2029 
(threonine isoleucine) and 2050 
(asparagine -> aspartic acid) . 

15. The virus of Claim 1 wherein the virus 
is from the Family Rhabdoviridae . 

16. The virus of Claim 1 wherein the virus 
is from the Family Filoviridae. 

17. A vaccine comprising an isolated, 
recombinantly-generated, attenuated, nonsegmented, 
negative- sense, single stranded RNA virus of the Order 
Mononegavirales according to Claim 1 and a 
physiologically acceptable carrier. 

18. The vaccine of Claim 17 comprising a 
measles virus according to Claim 5 and a 
physiologically acceptable carrier. 
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19. The vaccine of Claim 18 comprising a 
measles virus according to Claim 6 and a 
physiologically acceptable carrier. 

20. The vaccine of Claim 17 comprising a 
PIV-3 according to Claim 8 and a physiologically 
acceptable carrier. 

21. The vaccine of Claim 20 comprising a 
PIV-3 according to Claim 9 and a physiologically 
acceptable carrier. 

22. The vaccine of Claim 17 comprising an 
RSV subgroup B according to Claim 13 and a 
physiologically acceptable carrier. 

23. The vaccine of Claim 22 comprising an 
RSV subgroup B according to Claim 14 and a 
physiologically acceptable carrier. 

24. A method for immunizing an individual to 
induce protection against a nonsegmented, negative- 
sense, single stranded RNA virus of the Order 
Mononegavirales which comprises administering to the 
individual the vaccine of Claim 17. 

25. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 18. 

26. The method of Claim 25 wherein the 
vaccine is the vaccine of Claim 19. 

27. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 20. 

28. The method of Claim 27 wherein the 
vaccine is the vaccine of Claim 21. 

29. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 22. 

30. The method of Claim 29 wherein the 
vaccine is the vaccine of Claim 23. 
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31. An isolated nucleic acid molecule 
comprising a measles virus sequence in positive strand, 
antigenomic message sense selected from the group 
consisting of 1977 wild-type strain (SEQ ID N0:3), 1983 
wild- type strain (SEQ ID NO: 5) where the nucleotide 
2499 is G or C, Montefiore wild-type strain (SEQ ID 
NO: 7), Riibeovax™ vaccine strain (SEQ ID NO: 9), where 
the nucleotide 2143 is T or C, Moraten vaccine strain 
(SEQ ID NO:ll), Schwarz vaccine strain (SEQ ID N0:11), 
where the nucleotide 4917 is C and the nucleotide 4924 
is C, and Zagreb vaccine strain (SEQ ID NO: 13), and the 
complementary genomic sequences thereof. 

32 . An isolated nucleic acid molecule 
comprising a PIV-3 sequence in positive strand, 
antigenomic message sense selected from the group 
consisting of cp45 vaccine strain grown in fetal rhesus 
lung cells (SEQ ID NO: 19) and cp45 vaccine strain grown 
in Vero cells (SEQ ID NO:21) , and the complementary 
genomic sequences thereof. 

33. A composition which comprises a 
transcription vector comprising an isolated nucleic 
acid molecule encoding a genome or antigenome of a 
nonsegmented, negative-sense, single stranded RNA virus 
of the Order Mononegavirales having at least one 
attenuating mutation in the 3' genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene, together with at least one expression 
vector which comprises at least one isolated nucleic 
acid molecule encoding the trans -acting proteins 
necessary for encapsidation, transcription and 
replication, whereby upon expression an infectious 
attenuated virus is produced. 
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34. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim S and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans -acting proteins N, P and L. 

35. The composition of Claim 34 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim 6. 

36. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 8 and 
the at least one expression vector comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins NP, P and L. 

37. The composition of Claim 36 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 9. 

38. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 13 and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans-acting proteins N, P, L and M2 . 

39. The composition of Claim 38 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 14 . 

40. A method for producing infectious 
attenuated non segmented, negative -sense, single 
stranded RNA virus of the Order Mononegavirales which 
comprises transforming or transfecting host cells with 
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the at least two vectors of Claim 33 and culturing the 
host cells under conditions which permit the co- 
expression of these vectors so as to produce the 
infectious attenuated virus. 

41. The method of Claim 40 wherein the virus 
is the measles virus of Claim 5. 

42. The method of Claim 41 wherein the virus 
is the measles virus of Claim 6. 

43. The method of Claim 40 wherein the virus 
is the PIV-3 of Claim 8. 

44. The method of Claim 43 wherein the virus 
is the PIV-3 of Claim 9. 

45. The method of Claim 40 wherein the virus 
is the RSV subgroup B of Claim 13. 

46. The method of Claim 45 wherein the virus 
is the RSV subgroup B of Claim 14 . 
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